What are some methods of Variable Selection?

Some of the most common methods of variable selection are: Forward Selection, Backward Selection, and Stepwise Selection

  • Forward Selection: Forward selection starts with a null, or empty, model and begins by adding the predictor that provides the most improvement in the model’s fit using a criteria such as AIC or R2.  It does this by fitting a separate regression model for each of the candidate predictors and identifying the variable that results in the best model out of all possible simple regression models. The algorithm then searches for the next variable that most improves the model’s fit, given the first variable that is already included in the model specification. Once a variable is added, it cannot later be removed from the model. The procedure continues adding predictors using this approach until there are no more variables left that would result in a marginal improvement in model fit above a predetermined threshold. 
  • Backwards Selection: Backwards selection works in the opposite direction from forward selection, beginning with a full model, or one that includes all candidate terms. It first eliminates the variable that is least important to the model based on the criteria used, for example, the term that when removed, leads to the smallest decrease in R2 or RMSE. Once a variable is eliminated, it cannot be added back to the model. The algorithm continues eliminating variables until a stopping criteria is reached and there is nothing left to remove. 

Note: The closest Python implementation of backwards selection is Recursive Feature Elimination in scikit-learn. There is no direct equivalent of forward selection. 

  • Stepwise Selection: Stepwise selection combines elements of both forward and backwards selection. It starts like the forward approach with a null model and identifies the first variable that should be added based on the determined evaluation criteria. However, after each forward step, a backward step is also performed, meaning that after a predictor has been added, it can be taken out at a later step if removing it is determined to improve fit. The algorithm continues alternating between adding and if necessary removing variables until there is nothing left to alter.