Categorical data clustering: 25 years beyond K-modes
Tai Dinh, Wong Hauchi, Philippe Fournier-Viger, Daniil Lisik,, Minh-Quyet Ha, Hieu-Chi Dam, Van-Nam Huynh

TL;DR
This paper reviews 25 years of research in clustering categorical data, highlighting key algorithms, their applications across various fields, and discussing current challenges and future opportunities.
Contribution
It provides a comprehensive synthesis of categorical data clustering developments since K-modes, including practical algorithm comparisons and performance insights on benchmark datasets.
Findings
Performance of recent algorithms on benchmark datasets
Distinguishing features of various clustering methodologies
Challenges and opportunities in categorical data clustering
Abstract
The clustering of categorical data is a common and important task in computer science, offering profound implications across a spectrum of applications. Unlike purely numerical data, categorical data often lack inherent ordering as in nominal data, or have varying levels of order as in ordinal data, thus requiring specialized methodologies for efficient organization and analysis. This review provides a comprehensive synthesis of categorical data clustering in the past twenty-five years, starting from the introduction of K-modes. It elucidates the pivotal role of categorical data clustering in diverse fields such as health sciences, natural sciences, social sciences, education, engineering and economics. Practical comparisons are conducted for algorithms having public implementations, highlighting distinguishing clustering methodologies and revealing the performance of recent algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
