Pros:

- Relatively fast computational time due to kernel trick
- Ability to learn non-linear decision boundaries
- Performs well with high-dimensional data
- No assumptions to verify; however, it is recommended to scale data first

Cons:

- Performance of algorithm is sensitive to the choice of kernel
- Does not easily produce probability scores; only class labels
- No direct way to determine variable importance (might have to use model agnostic permutation approach)