Related Questions:
– How to mitigate Overfitting?
– What is Underfitting?
– What is the Bias-Variance Tradeoff?
Overfitting occurs when a machine learning model becomes too complex and starts fitting the training data too closely. This causes the model to learn the noise and random fluctuations in the training data instead of the underlying patterns and relationships that are relevant to the problem being solved. As a result, the model may perform very well on the training data but poorly on new, unseen data.
The best way to identify if a model is being overfitted is to compare the training and test error. Generally speaking, training error is lower than the test error. And the goal of any machine learning model is to minimize both: a) the training error, as well as, b) the gap between training and test error. If the training error is low, and the difference between training and test error is significant then the model might be in the overfitting zone. The following figure illustrates how training and test error can help identify if a model is underfitted, optimally fitted or overfitted:

One can think of Overfitting as a form of “memorization” of the training data, where the model becomes too specialized to the training data and loses its ability to generalize to new data.
Causes of Overfitting and how it can be mitigated
There are several common causes of overfitting. One is using a model that is too complex for the given dataset. For example, a decision tree with too many levels or a neural network with too many layers and hidden units may be more complex than is necessary to solve the problem at hand. Similarly, having too many features can also make a model complex.
Another cause of overfitting is using too few examples to train the model, as this can make it more difficult to find the underlying patterns in the data. In that case getting more training data can help.
In addition to these causes, overfitting can also occur when the model is not regularized properly. Regularization techniques, such as L1 or L2 regularization, add a penalty term to the model’s loss function that discourages it from fitting the training data too closely. Dropout, another regularization technique, randomly drops out some of the neurons in a neural network during training, which can help prevent the network from memorizing the training data.
Finally, overfitting can be mitigated by using cross-validation, which replicates evaluating the model’s performance on new, unseen data. Cross-validation involves splitting the dataset into multiple parts, training the model on some parts and evaluating its performance on the remaining parts. This helps ensure that the model can generalize well to new data and is not simply memorizing the training data.
Causes | Mitigation |
---|---|
Complex Model – Too many features – Too many parameters and layers (in case of Neural network) – Hyperparameters like tree depth in decision trees, order of polynomial used for regression or svm kernels, etc. | Try reducing the model complexity by: – using less number of features (feature selection, dimensionality reduction) – reducing the number of layers and dimensions in case of neural networks – using optimal hyperparameters for model complexity |
Too few training examples | Get more training data |
No Regularization | Regularization Techniques: – L1 or L2 regularization – Dropout – Early stopping |
No Cross Validation | Use Cross Validation to pick a model that generalizes well |
Miscellaneous: Would collecting more training data necessarily helps with overfitting?
A larger data set allows a model to have access to a greater variety of data so that it will be able to have lower variance on future observations. However, if the additional data is not providing additional information to the model, more data alone will not necessarily improve performance. In some applications, it can be very time consuming to collect data, so simply seeking more observations might not be the most practical approach. The most important aspect regarding data size is to have enough data to sufficiently explore the feature space.
Visual Explanation
The following infographics explains how does the decision boundaries of a overfitted classification and regression model looks like in comparison to an optimally fitted model. It also further summarizes the key characteristics of overfitting, and how to mitigate them.

Video Explanations
For a quick 2 min introduction on What is Overfitting and how to identify if a model is overfitted or not, please see this explanation from IntuitiveML [Runtime: 1:40 mins]
- In order to understand what does an overfitted regression or classification model looks like, please see this video from Andrew Ng [Runtime: 12 mins]:
https://www.youtube.com/watch?v=8upNQi-40Q8&list=PLkDaE6sCZn6FNC6YRfRQc_FbeQrF8BwGI&index=37 - For a thorough understand of overfitting, and how random noise effects overfitting, and how overfitting can be mitigated more formally, please see this excellent video lecture from Yaser Abu-Mostafa [Runtime: 1 hr 20 mins]:
https://www.youtube.com/watch?v=EQWr3GGCdzw