CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering
Taixi Chen, Yiu-ming Cheung, Yiqun Zhang

TL;DR
This paper introduces CADM, a novel cluster-customized distance metric for categorical and mixed data clustering that adapts to attribute distributions within each cluster, improving clustering accuracy.
Contribution
The paper proposes a new adaptive distance metric tailored for categorical and mixed data, accounting for cluster-specific attribute distributions, which enhances clustering performance.
Findings
Achieves top ranking in 14 datasets
Effectively handles mixed numerical and categorical data
Demonstrates improved clustering accuracy
Abstract
An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute values usually vary in different clusters induced by their different distributions, which has not been taken into account, thus leading to unreasonable distance measurement. Therefore, we propose a cluster-customized distance metric for categorical data clustering, which can competitively update distances based on different distributions of attributes in each cluster. In addition, we extend the proposed distance metric to the mixed data that contains both numerical and categorical attributes. Experiments demonstrate the efficacy of the proposed method, i.e., achieving an average ranking of around first in fourteen datasets. The source code is available at https://anonymous.4open.science/r/CADM-47D8
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Face and Expression Recognition
