Machine Learning Resources

How does the initial choice of centroids affect the K-Means algorithm?

The final cluster assignments of the K-Means algorithm can be sensitive to the location of the initial centroids. For example, it is possible that one observation could be far removed from any other points in its region, and in an extreme case, a cluster could end up having only one data point. On the flip side, if initial centroids are chosen in close proximity to one another, it might lead to clusters that have a lot of overlap and fail to separate points into distinguishable regions within the data. K-Means usually is repeated multiple times with different initializations, and the iteration that results in the most pure clusters is chosen. Further, more specific initialization strategies exist to improve the quality of clustering.

Partner Ad