The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

Machine Learning Resources

Why does multicollinearity result in poor estimates of coefficients in linear regression?

Bookmark this question

In matrix form, the vector of coefficient estimates is derived using the formula: (X’X)-1X’Y, where X is the design matrix where the rows correspond to the observations and columns to the features, and Y is the vector of target values.

Being that the X’X matrix has to be inverted, the computation fails if it is completely singular, which occurs in the case of perfect multicollinearity, such as if one feature is a direct function of others. Even if the multicollinearity is not explicit and estimates are able to be derived, the resulting coefficient estimates exhibit a larger standard error, meaning there is less of a chance of finding the feature to be significant based on its p-value. Further, the estimates can be overly sensitive to small changes in the model, meaning that for a given predictor, its effect size could differ drastically if one other variable was added or removed from the composite equation. 

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor |