What hyper-parameters are typically tuned in SVM?

  • C (regularization parameter): The C parameter influences the shape of the decision boundary and degree to which misclassifications are allowed in the determination of the soft margin hyperplane. It is inversely proportional to the amount of regularization, meaning larger values for C result in less regularization and thus more complex, jagged decision boundaries. On the other hand, smaller values for C imply increased regularization and thus smoother decision boundaries that allow for more misclassifications in the training data.
  • Kernel Function: As discussed, the kernel function must be specified when training a SVM. If the decision boundary is believed to be linear, the linear kernel is the best and most efficient choice. In more complex situations, the RBF is usually the recommended choice, and it requires the gamma parameter to be provided and tuned.
  • Gamma (RBF kernel only): The gamma parameter controls the amount of influence an individual observation carries in determining the similarity between observations. Smaller gamma values imply that observations further apart can be considered similar, which results in smoother decision boundaries. If gamma is larger, points must be close together to be considered similar, thus resulting in less smooth decision boundaries. Gamma and C should be tuned together using cross validation, as there is often interaction present between the hyper-parameters that requires a grid search to determine the optimal combination of settings.