What is the purpose of feature selection, and what are some common approaches?

Feature selection seeks to reduce the feature space by eliminating candidate predictors that have little predictive power for the target of interest. In most situations, a parsimonious model is preferred, provided it still performs at a satisfactory level of accuracy. A simpler model usually results in clearer interpretation and shorter training times, in addition to having a reduced risk of overfitting. The following are some of the most common feature selection options:

  • Filter methods: These approaches use a statistical measure, such as correlation or p-value, between an input variable and the target as the criteria for inclusion or exclusion. A threshold is usually set by the researcher to eliminate variables that fall outside of the criteria to be retained, such as a correlation below 0.5. 
  • Wrapper methods: Wrapper methods describe approaches that create models from many different subsets of the candidate predictors and then ultimately choose the combination of variables that results in the best model among those tried. A common wrapper method is Recursive Feature Elimination (RFE), which seeks to recursively eliminate the least important features from the model until the desired number or proportion out of the original set of features remains. As wrapper methods fit multiple models using different combinations of the feature space, they can take a long time to complete if there are many input features. Forward/backward/stepwise selection in the regression context are also examples of wrapper methods. 
  • Automatic feature selection: Some algorithms, like LASSO, have a built-in feature selection approach by shrinking features that have no predictive ability all the way to zero, hence eliminating them from the model.