What are the most common categories of clustering?

  • Exclusive Clustering: Each observation is assigned to one and only one cluster. K-Means, which is the most common and simplest type of clustering, is an example of this type. While there is no mathematical ambiguity as to which cluster an observation belongs to, it does not quantify uncertainty for points that lie near the boundary of clusters. 
  • Probabilistic (Fuzzy) Clustering: Each observation is assigned to one or more clusters with a probability of belonging to each. The cluster assignments are conceptualized as having a probability distribution of belonging to each cluster rather than one exclusive assignment. This has the advantage of quantifying uncertainty if there is ambiguity in cluster assignments. There exists a fuzzy version of K-Means that implements soft clustering, and Expectation-Maximization approaches like Gaussian Mixture models also have this capability. 
  • Hierarchical Clustering: This approach starts with all observations either belonging to their own cluster consisting of just that data point (agglomerative or bottom-up) or all observations belonging to one large cluster containing every data point (divisive or top-down). Clusters are formed on successive iterations by merging clusters that are most similar in the bottom-up approach or splitting those furthest apart in the top-down approach until it stabilizes at a number of clusters somewhere between 1 and the total number of observations in the dataset. 
  • Model-based Clustering: In this approach, clusters are represented by parametric distributions, and the data is modeled as a mixture of the specified distributions. For example, a set of clusters could be represented by different normal distributions, where each has a different mean and variance. This is referred to as a Gaussian Mixture Model, which is a powerful generative approach that can be used in clustering.