The bias-variance tradeoff is a fundamental concept in supervised machine learning that describes the relationship between a model’s ability to fit the training data and its ability to generalize to new, unseen data. This tradeoff is important because a model that is too simple (i.e., has high bias) may underfit the data, while a model that is too complex (i.e., has high variance) may overfit the data.
Bias and Variance
Before we discuss the bias-variance tradeoff, let’s first understand what bias and variance mean. Bias refers to the difference between the expected or average prediction of the model and the true value of the target variable. A model with high bias is too simple and does not capture the underlying complexity of the data. As a result, it underfits the training data and performs poorly on both the training and test data. In other words, it has a high training error and a high test error.
On the other hand, variance refers to the sensitivity of model’s predictions to small fluctuations in the training set. A model with high variance is too complex and captures noise or random fluctuations in the training data. As a result, it overfits the training data and performs well on the training data but poorly on the test data. In other words, it has a low training error but a high test error.
The goal in supervised machine learning is to find a model that strikes a balance between bias and variance, in a way that minimizes the overall error of the model. This is called the bias-variance tradeoff. Models with high bias are said to have low variance, while models with high variance are said to have low bias.
Consider the following figure:
The figure shows the relationship between model complexity and error. As the complexity of the model increases, the bias decreases and the variance increases. At the same time, the total error of the model first decreases and then increases. The optimal model complexity is the point at which the total error is minimum.
How to find an optimal point for Bias-Variance
Primary way to achieve the bias-variance tradeoff is through cross-validation. Cross-validation is a technique that evaluates the performance of the model on multiple subsets of the data. This allows us to estimate the model’s performance on new, unseen data, and to choose the optimal hyperparameters of the model that minimize the total error.
Another way to achieve the bias-variance tradeoff is through regularization. Regularization is a technique that adds a penalty term to the loss function of the model, which discourages the model from being too complex. The penalty term is typically a function of the weights of the model. The regularization parameter controls the strength of the penalty term, and determines the tradeoff between the bias and variance of the model.
- The following video from Andrew Ng and other subsequent videos in this series explain the bias-variance tradeoff and how we can find an optimal operating point (Runtime: 9 mins)
- A Mathematical way of looking at the Bias Variance tradeoff is called the Bias Variance Decomposition of the Mean Squared Error. Under this, the Mean Squared Error is decomposed as a combination of Bias, Variance and irreducible error as follows:
The following video from Jeff Miller (Mathematical Monk) goes through the derivation of this decomposition: https://www.youtube.com/watch?v=C3nIFH649wY