What is Clustering?

Beginning here with an example:

“Joe Biden wins US election” – Washington Post

“A massive win for Joe Biden in the closest election ever ran in the US history” – New York Times

“Joe Biden wins the presidential race” – Wall Street Journal.

All the above news items are similar in nature from a newsreader’s perspective, and therefore it makes no sense for the person to read three news. News media sites such as Google News use Clustering methods to bundle similar news articles under one topic while segregating other news.

More formally, Clustering is used to partition data set into N distinct groups/clusters. These groups are semantically coherent in nature, ie. items within the group are similar to one another, and items across groups are different from each other.

K-means Clustering is one of the most popular Clustering methods used for analyzing unlabeled datasets.

