What are some of the evaluation criteria used to assess the fit of a Linear Regression model? 

After confirming that the fitted model meets the assumptions necessary for linear regression, the next step of a regression analysis is usually to evaluate how well the model is performing in terms of fit and accuracy. 

  • Global F test: This is the most high-level model significance measure, which simply reports if any component of the model is significant. The null hypothesis is that nothing is significant, and the alternative is that at least one coefficient is. The test statistic represents a signal to noise ratio and is found by:

MSR: Mean Squared due to regression
MSE: Mean Squared Error

MSR is the component that measures the signal the model captures above simply using the overall mean to predict each observation, and MSE is the residual component that measures how far the predictions are from the actual values. This test is often not very informative from a practical standpoint, especially if there are many predictors in the model.

  • R-squared: In linear regression, the classic evaluation metric is R-squared, or more precisely if there are multiple predictors, adjusted R-squared. This statistic measures the proportion of variability accounted for through the terms included in the model out of the total variability inherent in the response variable

SSE: Residual Sum of Squares
SST: Total Sum of Squares

Values closer to 1 indicate that the chosen predictors are capturing the majority of variability inherent in the data, meaning the residual sum of squares (SSE) is a small fraction of the total sum of squares (SST). Values close to 0 imply that the model is doing little better than just predicting the overall mean for each observation, meaning most of the variability picked up in the model is from noise. The adjusted R-squared adds a penalty for additional terms included in the model, as regular R-squared will always increase with more terms, even if they are not adding any significance. Adjusted R2 is found by :

  • Measures of error (MSE, MAE, RMSE), or a measure of the overall variability in the model, can also be used as a goodness of fit criteria, in which lower values indicate less noise and thus are preferred. Squared loss (MSE) is generally preferred over absolute loss (MAE) because it gives a higher penalty to cases that the model performs poorly on. As it is difficult to interpret in terms of squared units, the square root of MSE (RMSE) is usually preferred in order to report the error on the same scale as the target variable is measured. 

MSE: Mean Squared Error
RMSE: Root Mean Squared Error
MAE: Mean Absolute Error

  • Information Criteria: Finally, more modern selection measures exist, such as AIC and BIC, that attempt to balance fit and complexity through a calculation based on the penalized log-likelihood. The values for these metrics have no meaning unless used when comparing between different models. In the case of information criteria, lower values are always preferred. The only difference between the formulas for AIC and BIC lies in the penalty component for parameters, as both are based on the log likelihood (L) calculation. 

AIC: Akaike information criterion 
BIC: Bayesian information criterion