Pros:
- Relatively fast computational time due to kernel trick
- Ability to learn non-linear decision boundaries
- Performs well with high-dimensional data
- No assumptions to verify; however, it is recommended to scale data first
Cons:
- Performance of algorithm is sensitive to the choice of kernel
- Does not easily produce probability scores; only class labels
- No direct way to determine variable importance (might have to use model agnostic permutation approach)