How does discriminant analysis work at a high level?

Discriminant analysis is a dimension reduction approach similar to principal components analysis but applied in a classification context. Instead of reducing the feature space in a way that captures the most variability in the data like PCA, discriminant analysis projects the data onto a new axis in a way that maximizes the distance between class means while minimizing the variability within classes. It estimates class probabilities using a specified decision rule, the most common of which is based on Bayes’ Theorem incorporating the prior probability of an observation belonging to each class and the conditional probability of each class given the input feature values. The class that results in the highest posterior probability for a given observation is the label to which that data point is assigned.