CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

Taixi Chen; Yiu-ming Cheung; Yiqun Zhang

arXiv:2511.05826·cs.LG·March 9, 2026

CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

Taixi Chen, Yiu-ming Cheung, Yiqun Zhang

PDF

Open Access

TL;DR

This paper introduces CADM, a novel cluster-customized distance metric for categorical and mixed data clustering that adapts to attribute distributions within each cluster, improving clustering accuracy.

Contribution

The paper proposes a new adaptive distance metric tailored for categorical and mixed data, accounting for cluster-specific attribute distributions, which enhances clustering performance.

Findings

01

Achieves top ranking in 14 datasets

02

Effectively handles mixed numerical and categorical data

03

Demonstrates improved clustering accuracy

Abstract

An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute values usually vary in different clusters induced by their different distributions, which has not been taken into account, thus leading to unreasonable distance measurement. Therefore, we propose a cluster-customized distance metric for categorical data clustering, which can competitively update distances based on different distributions of attributes in each cluster. In addition, we extend the proposed distance metric to the mixed data that contains both numerical and categorical attributes. Experiments demonstrate the efficacy of the proposed method, i.e., achieving an average ranking of around first in fourteen datasets. The source code is available at https://anonymous.4open.science/r/CADM-47D8

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Face and Expression Recognition