The website is in Maintenance mode. We are in the process of adding more features.
Any new bookmarks, comments, or user profiles made during this time will not be saved.

AIML.com

Machine Learning Resources

What are some options for clustering on categorical data? What if the dataset contains a combination of numeric and categorical features?

Bookmark this question
  • K-Modes: K-Modes is a modification of K-Means suitable for datasets with all categorical features that clusters based on matches/mismatches across the features of the observations rather than numerical distance. The algorithm performs cluster assignment and iterates in the same way as k-means, just utilizing a different measure of similarity. 
  • K-Medoids (PAM Clustering): This approach, which stands for Partitioning Around Medoids, accounts for mixed data types by using a different similarity measure for numeric versus categorical features. It uses a measure called the Gower Distance to compute the partial similarities based on data type. PAM clustering is more robust to outliers compared to K-Means but can be computationally expensive on large datasets.  

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can

Explore Questions by Topics

Partner Ad

Learn Data Science with Travis - your AI-powered tutor | LearnEngine.com