While the classification topics have been largely discussed in the context of binary classification, most of the concepts transfer over to the situation of an outcome with more than two levels, which is referred to as multi-class classification. Some algorithms can support multiple classes more easily than others, but in general, a multi-class response can be handled using either a one vs. rest or one vs. one approach.
- One vs. Rest (OvR): This technique splits the dataset into multiple binary classification problems where for each classification task, a binary classifier is fit to a target where one level is considered the positive class and a grouping consisting of all the other levels forms the negative class. The process repeats until all labels have been treated as the positive class against all of the remaining classes. For example, if the target class consists of generic labels Class 1, Class 2, and Class 3, the following 3 binary classifications would be performed:
- Class 1 vs. [Class 2 + Class 3]
- Class 2 vs. [Class 1 + Class 3]
- Class 3 vs. [Class 1 + Class 2]
In general, if a multi-class response consists of k distinct categories, the one vs. rest approach requires k binary classifiers to be fit. In order to then assign a class label to each observation, the classifier that resulted in the highest probability of the target class compared to all other classes is chosen. For example, if the three regressions resulted in predictions of .3, .4, and .2 for one observation, it would be assigned the label of class 2.
2. One vs. One (OvO): This approach also partitions the response set into binary classification problems, but unlike OvR, OvO performs a classification of each label against all other labels separately. In the generic example of three classes, OvO would perform the following binary classifications:
- Class 1 vs. Class 2
- Class 1 vs. Class 3
- Class 2 vs. Class 3
While OvR and OvO for a response of three distinct levels each result in three different binary classifications being performed, OvO requires significantly more models as the number of categories increase. Precisely, OvO fits k(k-1)/2 models for a target consisting of k distinct levels. In order to determine the predicted class label for an observation, the predictions for each label are added up across all classifiers where that label was considered, and then the class with the highest total prediction is assigned to that observation.
While multi-class classification requires some nuance in order to assign a prediction to each observation, inference largely proceeds in an analogous fashion to binary classification. In the case of a 3-level response, a 3X3 confusion matrix can be created, and within each particular class, precision and recall metrics can be computed. The decision boundary can be tweaked to optimize a particular metric for a certain class, similarly to how the equivalent metric can be optimized in the binary case.