Categorical Data Clustering via Value Order Estimated Distance Metric Learning
Yiqun Zhang, Mingjie Zhao, Hong Jia, Yang Lu, Mengke Li, and Yiu-ming Cheung

TL;DR
This paper introduces a novel order distance metric learning approach for clustering categorical data, which improves accuracy and interpretability by learning optimal value order relationships and integrating them into a unified clustering framework.
Contribution
It proposes a new joint learning paradigm that alternates between clustering and order distance metric learning, specifically designed for categorical and mixed datasets.
Findings
Achieves superior clustering accuracy on categorical datasets
Reduces complexity in understanding categorical data
Validated through experiments, ablation studies, and significance tests
Abstract
Clustering is a popular machine learning technique for data mining that can process and analyze datasets to automatically reveal sample distribution patterns. Since the ubiquitous categorical data naturally lack a well-defined metric space such as the Euclidean distance space of numerical data, the distribution of categorical data is usually under-represented, and thus valuable information can be easily twisted in clustering. This paper, therefore, introduces a novel order distance metric learning approach to intuitively represent categorical attribute values by learning their optimal order relationship and quantifying their distance in a line similar to that of the numerical attributes. Since subjectively created qualitative categorical values involve ambiguity and fuzziness, the order distance metric is learned in the context of clustering. Accordingly, a new joint learning paradigm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
