What is Feature Scaling, and why and in what context is it useful?

Feature scaling is a data pre-processing technique that transforms the original support of a variable to a different scale. In algorithms that contain a distance measure anywhere in the cost function, such as regularized regression or K-means clustering, feature scaling removes undue weight given to certain features simply because they are measured on a larger scale. It also improves the stability of some algorithms that are trained using an iterative technique like gradient descent, such as Neural Networks.

The two most common feature scaling approaches are:

(a) Standardization, which subtracts each value from its mean and then divides by its standard deviation, and

(b) MinMax scaling, which transforms the variable to a range of [0,1] by subtracting the minimum value and dividing by the range.

It is important to always fit a scaler object only using the training data and then apply it to validation data using the parameters learned from the training data to avoid data leakage.