What is IDF? What do we need IDF?

Inverse Document Frequency builds upon Term Frequency by inversely weighting words that appear frequently across all of the documents. Thus, it diminishes the importance given to words that are common in general rather than to one specific document. The product of Term Frequency and Inverse Document Frequency results in a TF-IDF score, which is usually the preprocessing done before performing text classification. If a word has a high Term Frequency in a given document, it just means that it appears frequently in that document, but if it has a high TF-IDF score, it is a better measure that it is important to that document relative to the entire corpus.