What is Adjusted Rand Index (ARI)?

The Rand Index can be used for comparing the results from multiple clustering algorithms. It is simply a ratio of the number of agreeing pairs of observations, which include both those assigned to the same cluster in both iterations as well as those assigned to different clusters in both iterations, over the total number of pairs of data (nC2). Values close to 0 indicate strong discordance between algorithms, meaning the observations are assigned to clusters by random chance, and a value of 1 indicates perfect agreement of cluster assignments, which is preferred.

The Adjusted Rand Index (ARI) adjusts the raw Rand Index for classification by chance by subtracting the expected index from both the numerator and denominator. A motivation for considering the ARI over the raw Rand Index is because it is possible for the latter to be close to 1 by chance if the number of clusters k is large due to a higher number of pairs being assigned to different clusters, thus possibly biasing the result. In the ARI, the adjustment ensures that a random result receives a score of 0, thus correcting the problem described. It is also possible for it to be negative, which implies the agreement is less than expected from a random clustering, which could be due to a systematic error in the data generation process.