Break the Tie: Learning Cluster-Customized Category Relationships for Categorical Data Clustering
Mingjie Zhao, Zhanpei Huang, Yang Lu, Mengke Li, Yiqun Zhang, Weifeng Su, Yiu-ming Cheung

TL;DR
This paper introduces a novel clustering method that learns customized relationships between categorical attribute categories, improving clustering accuracy especially in mixed datasets by relaxing fixed relationship assumptions.
Contribution
It proposes a new approach to learn category relationships that are adaptable and Euclidean-compatible, enhancing clustering performance over fixed-relationship methods.
Findings
Significantly improved clustering accuracy on benchmark datasets.
Learned category relationships are Euclidean distance metric-compatible.
Outperforms existing methods with an average ranking of 1.25.
Abstract
Categorical attributes with qualitative values are ubiquitous in cluster analysis of real datasets. Unlike the Euclidean distance of numerical attributes, the categorical attributes lack well-defined relationships of their possible values (also called categories interchangeably), which hampers the exploration of compact categorical data clusters. Although most attempts are made for developing appropriate distance metrics, they typically assume a fixed topological relationship between categories when learning distance metrics, which limits their adaptability to varying cluster structures and often leads to suboptimal clustering performance. This paper, therefore, breaks the intrinsic relationship tie of attribute categories and learns customized distance metrics suitable for flexibly and accurately revealing various cluster distributions. As a result, the fitting ability of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Face and Expression Recognition
