Since there are no labels associated with the observations in unsupervised learning, there is no direct error metric that can be applied like mean squared error or accuracy in supervised learning problems. However, using distance-based approaches, there are some metrics suitable to measure the goodness of clusters, or compare the quality of clusters produced from multiple iterations.
- Within Cluster Sum of Squares (WCSS): The WCSS is a measure of the variability of observations within clusters that is calculated by taking the sum of the squared Euclidean distances between each observation and the centroid of its respective cluster. The cluster WCSS values are then averaged to get an overall WCSS for the clustering algorithm. Similar to error metrics in supervised learning, lower values of WCSS are preferable, indicating a more compact clustering.
- Silhouette Score: This metric compares the distance of observations to the centroids of the clusters they are assigned to against that to the centroids of other clusters in an algorithm like K-Means. It ranges between -1 and 1, where values close to -1 indicate that observations might have been assigned to the wrong cluster, where values close to 1 imply observations are close to the centroids of their own cluster but far from centroids of other clusters, which is indicative of compact clustering. Values closer to 0 indicate possible overlap between clusters, meaning it is ambiguous which cluster an observation should belong to. The average Silhouette score across all observations is reported to arrive at a global measure.
- Dunn Index: The Dunn Index is a ratio of the smallest distance between observations assigned to different clusters over the largest distance between observations assigned to the same cluster. Small values of this index imply that there is large variance within at least one cluster relative to distance between clusters, while large values mean that clusters are more heterogeneous, which is preferred.
- Rand Index: The Rand Index can be used for comparing the results from multiple clustering algorithms. It is simply a ratio of the number of agreeing pairs of observations, which include both those assigned to the same cluster in both iterations as well as those assigned to different clusters in both iterations, over the total number of pairs of data (nC2). Values close to 0 indicate strong discordance between algorithms, meaning the observations are assigned to clusters by random chance, and a value of 1 indicates perfect agreement of cluster assignments, which is preferred.
- Adjusted Rand Index (ARI): The ARI adjusts the raw Rand Index for classification by chance by subtracting the expected index from both the numerator and denominator. A motivation for considering the ARI over the raw Rand Index is because it is possible for the latter to be close to 1 by chance if the number of clusters k is large due to a higher number of pairs being assigned to different clusters, thus possibly biasing the result. In the ARI, the adjustment ensures that a random result receives a score of 0, thus correcting the problem described. It is also possible for it to be negative, which implies the agreement is less than expected from a random clustering, which could be due to a systematic error in the data generation process.
- Adjusted Rand Index (ARI): The ARI adjusts the raw Rand Index for classification by chance by subtracting the expected index from both the numerator and denominator. A motivation for considering the ARI over the raw Rand Index is because it is possible for the latter to be close to 1 by chance if the number of clusters k is large due to a higher number of pairs being assigned to different clusters, thus possibly biasing the result. In the ARI, the adjustment ensures that a random result receives a score of 0, thus correcting the problem described. It is also possible for it to be negative, which implies the agreement is less than expected from a random clustering, which could be due to a systematic error in the data generation process.
- Mutual Information (MI): The concept of Mutual Information measures the amount of information shared between two random variables, which in the context of unsupervised learning, translates to the similarity of clustering between two approaches. It has the advantages of being symmetric and unaffected by different permutations of labels. Just as with the Rand Index, an adjusted version of the score subtracts the expected mutual information to account for cluster assignments being consistent due to chance. There is also a normalized version that scales the values to a range between 0 and 1, where a value of 0 indicates no mutual information and a value of 1 implies full concordance.