The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

Machine Learning Resources

How is clustering affected by high-dimensional data, and how can the quality of clusters generated be improved in such cases?

Bookmark this question

One problem of performing clustering in high-dimensional data is that common distance metrics, such as Euclidean distance, do not perform as well as the number of dimensions becomes large. This is one of the issues caused by the Curse of Dimensionality, as the distance between any pair of points becomes less distinguishable as the dimensionality increases. One approach for dealing with high-dimensional data is to first reduce the dimensionality through a technique like PCA and then perform clustering on the principal components rather than the original data. Alternatively, algorithms such as DBSCAN are better suited to K-Means for identifying clusters in high-dimensional data.

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor |