The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

Machine Learning Resources

What is meant by Corpus and Vocabulary in Natural Language Processing?

Bookmark this question

A corpus of text is the entire set of documents considered. The meaning of a document in Natural Language Processing is very specific to the context, as the text being analyzed could be entire journal articles or short movie reviews. A single sentence that can fit into a Dataframe can even be considered a document. The vocabulary refers to the union of all words that appear throughout the entire corpus. For example, in the following corpus

  1. It is cold outside today.
  2. I love the beach.
  3. Pizza is for lunch today.

The vocabulary would be {It, is, cold, outside, today, I, love, the, beach, Pizza, for, lunch}. 

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor |