How can overfitting be mitigated?

  • Collect more training data: A larger data set allows a model to have access to a greater variety of data so that it will be able to have lower variance on future observations. However, if the additional data is not providing additional information to the model, more data alone will not necessarily improve performance. In some applications, it can be very time consuming to collect data, so simply seeking more observations might not be the most practical approach. The most important aspect regarding data size is to have enough data to sufficiently explore the feature space. 
  • Use regularization: Regularization reduces the complexity of the model by shrinking its coefficients closer to 0, especially for variables that are least predictive of the target. 
  • Simplify model hyperparameters (if applicable to algorithm): Similarly to regularization, simplifying model hyperparameters prevents a model from fitting to the training data so well that it does not generalize to a new sample of data. 
  • Implement early stopping: Early stopping reduces training time by terminating when the performance of the model does not improve by a substantial amount over several iterations. Using early stopping both improves efficiency and mitigates against overfitting.