Machine Learning Resources

What are the advantages and disadvantages of Bag-of-Words model?


  • Often perform at a high level of accuracy for tasks where the frequency or occurrence of words are predictive features
  • Easy to implement (scikit learn has API for count vectorization, TF-IDF)


  • If accounting for the order of the word sequence is important to the task, the Bag of Words approach will likely not be suitable (i.e. text generation, chatbots)
  • Can run into issues in computation as well as differentiating between vectors when the size of the vocabulary is large (high dimensional datasets)

Find out all the ways
that you can