Inverse Document Frequency builds upon Term Frequency by inversely weighting words that appear frequently across all of the documents.
A Term Frequency matrix consists of the IDs for the documents in the corpus for the rows
Vector space models are a family of models that represent data as vectors within space.
Tokenization is the process of separating text within documents into its smallest building blocks.
A corpus of text is the entire set of documents considered.