What are some options to address overfitting in Neural Networks?

One of the main drawbacks of deep learning is that it is more prone to overfitting than more traditional machine learning models. However, there are some options at hand that can be employed to mitigate the risk of overfitting. 

  • Use Regularization: The most common form of regularization used in deep learning is L2 regularization, which adds a squared penalty term to the loss function. In Neural Networks, regularization essentially has the effect of shrinking the magnitude of the weights so that the activation produced from a given hidden unit is in the more linear region of the output of the activation. This has the practical implication of creating a less complex decision mechanism. 
  • Implement Early Stopping: Early stopping essentially terminates the training of a deep learning model if after a certain number of iterations, the magnitude of decrease in the loss function is within a small threshold. Using early stopping makes it possible to set the number of iterations to a large number, as assuming the loss function will eventually bottom out before the final iteration, the model is not trained all the way out. This can be very beneficial in conserving computing resources.
  • Use Dropout: Dropout refers to randomly turning off hidden units so that a smaller network is trained on a given pass through the dataset. Basically, each node within the hidden layers has a probability of being turned off, so if the network is trained over multiple iterations of the data, the data is fed through different but simpler networks that result in lower variance than if the same, more complex model was used in each pass. Thus, dropout essentially achieves the same reduction in variance as creating an ensemble of complex networks.