How does SVM adjust for classes that cannot be linearly separated?

While the maximum margin classifier is optimal in theory, in practice, observations cannot be perfectly separated in most classification problems. Therefore, expecting to define a hyperplane that perfectly separates between classes with no misclassifications is not realistic. Instead of using a hard margin classifier, SVM uses a soft margin that allows for some misclassifications in the training process. The extent to which the algorithm is allowed to have misclassifications is controlled by a regularization parameter C and is typically tuned during cross validation. This issue is another example of the bias/variance tradeoff that occurs throughout machine learning, as the soft margin classifier introduces some bias with the hope of reducing variance when classifying future observations. In the case of a soft margin classifier, the support vector includes observations both on and within the margins.