What does L2 regularization (Ridge) mean?

L2, or Ridge regularization, is a form of regularization in which the penalty is based on the squared magnitude of the coefficients. The L2 cost function is as follows, where just as in the case of LASSO, lambda is the parameter that controls the amount of regularization applied. 

The major difference between Ridge and LASSO is that in Ridge, no coefficients are shrunk all the way to 0. Thus, it does not have a built-in variable selection capability. However, predictors that are the least important can have coefficients that are very close to 0. In both LASSO and Ridge regression, the magnitude of the coefficients directly translates to their effect in the model. 

An important data pre-processing step in regularized regression is to scale the features before fitting the model using a scaling technique such as standardization or minmax scaling. The need for feature scaling arises due to the presence of the second term in the loss function, which is where the shrinkage occurs. If a model contains one feature that is measured on a much larger scale than the others, regularization will not be able to sufficiently shrink its influence even if it is not an important predictor in the model. In order to apply an equal magnitude of regularization to all of the features, they should be converted to a similar scale range first.