Advantages:
- Can provide useful information beyond just considering individual tokens, as sometimes the full context cannot be conveyed by words in isolation
Disadvantages:
- Increase the dimensionality of the vocabulary, potentially leading to slower training time and higher risk of overfitting