How does K-Means ++ work?

K-Means ++ has been generally shown to be the best initialization approach to use when performing K-Means clustering. At a high level, it seeks to maximize the distance between points chosen as the initial cluster centroids. As one of the goals of clustering is to find clusters that distinguish between observations, it makes sense to start with centroids that are as far apart as possible. It does this by randomly choosing one data point to be the first centroid and then computing a probability for each of the other observations to be chosen in a way that the maximum probability is given to observations furthest apart from the centroids already chosen until k such observations are found. The K-Means algorithm proceeds as usual once the initial centroids have been chosen.