Why does multicollinearity result in poor estimates of coefficients in linear regression?

In matrix form, the vector of coefficient estimates is derived using the formula: (X’X)-1X’Y, where X is the design matrix where the rows correspond to the observations and columns to the features, and Y is the vector of target values.

Being that the X’X matrix has to be inverted, the computation fails if it is completely singular, which occurs in the case of perfect multicollinearity, such as if one feature is a direct function of others. Even if the multicollinearity is not explicit and estimates are able to be derived, the resulting coefficient estimates exhibit a larger standard error, meaning there is less of a chance of finding the feature to be significant based on its p-value. Further, the estimates can be overly sensitive to small changes in the model, meaning that for a given predictor, its effect size could differ drastically if one other variable was added or removed from the composite equation.