What is Factor Analysis, and how does it differ from PCA?

Factor Analysis is a dimensionality reduction technique that, like PCA, attempts to explain the variability across a set of features, but instead of creating new features by using linear combinations of the original variables, it seeks to find hidden or latent constructs that cannot be directly measured but in theory help explain the variability in the raw data.

A canonical use case of Factor Analysis occurs in survey data, where a researcher might be interested in understanding an abstract notion such as intelligence or anxiety based on responses to a questionnaire. It is difficult if not impossible to directly measure such constructs, but in a well-designed survey, respondents’ answers to groups of items might coalesce around such a latent variable.

In terms of model specification, Factor Analysis can be thought of in a reverse manner from PCA. In PCA, the original variables provide the weights of each principal component in such a way that maximizes the variability explained among the high-dimensional set of features, whereas in Factor Analysis, the hidden construct loads onto the original variables. This creates a setup where the original variable is regressed on the constructs, or common factors, with a residual term called the specific variance that is analogous to the sigma squared term in OLS.

The most significant practical difference that Factor Analysis provides is that there is not one clear mathematical solution, as different rotations can lead to different outcomes and thus different interpretations. In general, PCA is more suited as a tool utilized in data preprocessing for machine learning tasks, where Factor Analysis is more applicable to exploratory analysis in research settings.