What is the Bias/Variance Tradeoff?

The bias/variance tradeoff refers to the challenge of finding a model that both performs at a high level of accuracy on the data on which it is trained (bias) while at the same time generalizes well to unseen data (variance). The tradeoff exists because in most cases, as a model performs better on the training data, it suffers in its ability to perform at an equivalent level on data that it doesn’t have access to in the training process.

If a model has high bias, it is not able to learn the relationship between the input features and target, which is referred to as underfitting. If a model has high variance, it is not able to generalize well on new data. The latter often occurs with complex machine learning algorithms such as ensemble methods and neural networks. Therefore, evaluating a model’s performance both in terms of bias and variance is an important part of the machine learning process. It is especially important to ensure that a model can perform well on unseen data before putting it into production, as the validation performance metrics are often the best proxy for how it will perform in real time.