Machine Learning Resources

How do outliers affect the clusters formed in K-Means?

Being that clustering is a distance-based algorithm, outliers can have multiple undesired effects on the quality of the clusters produced. Being the objective of K-Means is to minimize the within cluster sum of squares, or distance from each observation to the cluster’s centroid, outliers that are far from the centroids will prevent the objective from achieving a minimum compared to if they were not present. It is also possible that the presence of a small number of outliers can result in clusters that only contain a few observations, which can obscure the practical conclusions of what the clusters represent. This further emphasizes the importance of scaling the data before a clustering algorithm is trained, but even after scaling, noticeable outliers should be investigated further.

Partner Ad