How does the EM algorithm (in the context of GMM) compare to K-Means?
K-Means aims to minimize the Within Cluster Sum of Squares, while EM aims to maximize the likelihood of an underlying probability distribution.
K-Means aims to minimize the Within Cluster Sum of Squares, while EM aims to maximize the likelihood of an underlying probability distribution.
Pros: Do not have to specify the number of clusters before running the algorithm
Pros: Easy to implement
Cons: Must specify number of clusters in advance
Being that clustering is a distance-based algorithm, outliers can have multiple undesired effects on the quality of the clusters produced.
K-Means ++ has been generally shown to be the best initialization approach to use when performing K-Means clustering.
Using an objective function that minimizes the within-cluster sum of squares (WCSS) causes K-Means to produce spherically shaped clusters.
K-Means minimizes the total within-cluster sum of squares (WCSS)
The final cluster assignments of the K-Means algorithm can be sensitive to the location of the initial centroids.
The most common way to choose k is to run the algorithm over a range of values
K-Means starts by selecting initial centroids for the k-clusters by randomly choosing k observations
Partner Ad