### What is the basic idea of Support Vector Machine (SVM) and Maximum Margin?

SVM is a classification algorithm that seeks to determine a decision boundary by maximizing the distance between points of different classes.

- Machine Learning 101 (30)
- Statistics 101 (38)
- Supervised Learning (114)
- Regression (42)
- Classification (46)
- Logistic Regression (10)
- Support Vector Machine (10)
- Naive Bayes (4)
- Discriminant Analysis (5)
- Classification Evaluations (9)

- Classification & Regression Trees (CART) (23)

- Unsupervised Learning (55)
- Clustering (28)
- Distance Measures (9)
- Dimensionality Reduction (9)

- Deep Learning (23)
- Data Preparation (34)
- General (5)
- Standardization (6)
- Missing data (7)
- Textual Data (16)

SVM is a classification algorithm that seeks to determine a decision boundary by maximizing the distance between points of different classes.

While the maximum margin classifier is optimal in theory, in practice, observations cannot be perfectly separated in most classification problems.

While soft margin classification relaxes the requirement of a hyperplane that must perfectly distinguish between the classes, a separate issue arises when there is no way to define such a hyperplane in the original feature space.

The kernel trick allows SVM to form a decision boundary in higher dimensional space

Common choices for kernels include: Linear, Polynomial, Radial Basis, Sigmoid

Pros: Relatively fast computational time due to kernel trick

Cons:Performance of algorithm is sensitive to the choice of kernel

While SVM is used most often in classification scenarios, it can be extended to regression cases by allowing the user to provide a maximum margin of error allowed for any observation.

The hinge loss function uses the distance between observations and the hyperplane that separates classes in the SVM algorithm in order to quantify the magnitude of error.

Hinge loss adds an increased penalty to misclassifications that are off by a large amount, since the cost function increases linearly as the decision function output moves further away from the actual label.

C (regularization parameter), Kernel Function, Gamma (RBF kernel)

Find out all the ways

that you can