A model that is underfit will produce evaluation metrics that are poor on the training data alone, such as high RMSE or misclassification rate. A model that is overfit will appear to evaluate well on the training data but will show a strong deterioration in its performance metrics on a validation data set compared to the training set, such as low RMSE on the training but high on the validation.

A learning curve is a diagnostic tool that plots the error metric used to evaluate a machine learning algorithm for both the training and validation data at each iteration of the algorithm. In most cases, the training error, or deviance, will continue to decrease as the model is built out, while the validation error decreases for a number of iterations before eventually increasing. The point at which the validation error first begins to rise provides guidance for an appropriate number of iterations to balance the bias/variance tradeoff.

If a model is significantly underfit, both the training and validation error will be high and not significantly improve over further iterations. If the training error does not mostly flatten out by the last few iterations, it is likely a sign that the number of iterations are not sufficiently large enough for the algorithm to appropriately learn the data. On the other hand, if the training error is flat for many iterations while at the same time the validation error is increasing, the model is overfitting at that point of the algorithm, and the number of iterations should be decreased to the point at which the validation error first begins to rise. A classic learning curve is drawn below, with the optimal stopping point marked.