The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

Machine Learning Resources

How do outliers affect the clusters formed in K-Means?

Bookmark this question

Being that clustering is a distance-based algorithm, outliers can have multiple undesired effects on the quality of the clusters produced. Being the objective of K-Means is to minimize the within cluster sum of squares, or distance from each observation to the cluster’s centroid, outliers that are far from the centroids will prevent the objective from achieving a minimum compared to if they were not present. It is also possible that the presence of a small number of outliers can result in clusters that only contain a few observations, which can obscure the practical conclusions of what the clusters represent. This further emphasizes the importance of scaling the data before a clustering algorithm is trained, but even after scaling, noticeable outliers should be investigated further.

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor |