Clustering Categorical Data: Soft Rounding k-modes
Surya Teja Gavva, Karthik C. S., and Sharath Punna

TL;DR
This paper introduces SoftModes, a soft rounding variant of the k-modes clustering algorithm, which theoretically and empirically improves clustering performance on categorical data, especially under a generative block model.
Contribution
The paper proposes SoftModes, a novel algorithm that addresses k-modes' limitations in a generative model, with theoretical guarantees and empirical validation.
Findings
SoftModes outperforms k-modes in synthetic data.
SoftModes performs well on real-world datasets.
Theoretical analysis confirms improvements over traditional k-modes.
Abstract
Over the last three decades, researchers have intensively explored various clustering tools for categorical data analysis. Despite the proposal of various clustering algorithms, the classical k-modes algorithm remains a popular choice for unsupervised learning of categorical data. Surprisingly, our first insight is that in a natural generative block model, the k-modes algorithm performs poorly for a large range of parameters. We remedy this issue by proposing a soft rounding variant of the k-modes algorithm (SoftModes) and theoretically prove that our variant addresses the drawbacks of the k-modes algorithm in the generative model. Finally, we empirically verify that SoftModes performs well on both synthetic and real-world datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Anomaly Detection Techniques and Applications
