How is clustering affected by high-dimensional data, and how can the quality of clusters generated be improved in such cases?
What are some options for clustering on categorical data? What if the dataset contains a combination of numeric and categorical features?
What is the effect of minimizing the within-cluster sum of squares on the shapes of clusters produced in K-Means?