Approximation Algorithms for K-Modes Clustering
Zengyou He

TL;DR
This paper introduces a new approximation algorithm for k-modes clustering of categorical data, establishing its relation to k-median and extending metric-based algorithms, with empirical validation of its effectiveness.
Contribution
It connects k-modes to k-median, proves the metric property of k-modes, and develops a 2-approximation algorithm leveraging existing metric k-median methods.
Findings
The k-modes distance measure is a metric.
A deterministic 2-approximation algorithm for k-modes is proposed.
Empirical results show the method's superiority over existing approaches.
Abstract
In this paper, we study clustering with respect to the k-modes objective function, a natural formulation of clustering for categorical data. One of the main contributions of this paper is to establish the connection between k-modes and k-median, i.e., the optimum of k-median is at most twice the optimum of k-modes for the same categorical data clustering problem. Based on this observation, we derive a deterministic algorithm that achieves an approximation factor of 2. Furthermore, we prove that the distance measure in k-modes defines a metric. Hence, we are able to extend existing approximation algorithms for metric k-median to k-modes. Empirical results verify the superiority of our method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Advanced Clustering Algorithms Research · Face and Expression Recognition
