The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

AIML.com

Machine Learning Resources

Among the common machine learning algorithms, which require feature scaling, and which do not?

Bookmark this question

As a general rule of thumb, if any component of the objective function of the algorithm involves a distance measure, either between observations or to a central location, the data should be scaled before training the algorithm. If the algorithm is rule-based, such as a decision tree, scaling is not necessary. However, even if there is not an explicit need to do so, it is never necessarily wrong to scale the data, but the scale should be noted when it comes to interpretation. Using this heuristic, the following is a (non-exhaustive) mapping of where some of the most common algorithms fit in this regard.

Scaling is Necessary

  • Neural Networks (more so to aid in convergence of gradient descent optimizer)
  • Regularized Regression (Ridge, LASSO, Elastic Net, etc.)
  • Support Vector Machine
  • K-Nearest Neighbors
  • K-Means
  • Dimensionality Reduction (PCA, Factor Analysis)

Scaling is Not Necessary

  • Ordinary Regression (regular Linear, GLM regression w/o regularization)
    • However, if optimization is done using gradient descent, scaling data helps in convergence. 
  • Decision Tree Methods (CART, Random Forest, GBM, etc.)
  • Naive Bayes

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics