Advantages:
- Often perform at a high level of accuracy for tasks where the frequency or occurrence of words are predictive features
- Easy to implement (scikit learn has API for count vectorization, TF-IDF)
Disadvantages:
- If accounting for the order of the word sequence is important to the task, the Bag of Words approach will likely not be suitable (i.e. text generation, chatbots)
- Can run into issues in computation as well as differentiating between vectors when the size of the vocabulary is large (high dimensional datasets)