Hierarchical clustering of mixed-type data based on barycentric coding
Odysseas Moschidis, Angelos Markos, Theodore Chadjipadelis

TL;DR
This paper introduces a new hierarchical clustering method for mixed-type data that uses barycentric coding of continuous variables to reduce information loss and improve clustering quality.
Contribution
It proposes a novel agglomerative hierarchical clustering approach that leverages barycentric coding, addressing limitations of traditional methods on mixed data types.
Findings
Effective on real and simulated datasets
Reduces information loss compared to discretization
Compatible with correspondence analysis framework
Abstract
Clustering of mixed-type datasets can be a particularly challenging task as it requires taking into account the associations between variables with different level of measurement, i.e., nominal, ordinal and/or interval. In some cases, hierarchical clustering is considered a suitable approach, as it makes few assumptions about the data and its solution can be easily visualized. Since most hierarchical clustering approaches assume variables are measured on the same scale, a simple strategy for clustering mixed-type data is to homogenize the variables before clustering. This would mean either recoding the continuous variables as categorical ones or vice versa. However, typical discretization of continuous variables implies loss of information. In this work, an agglomerative hierarchical clustering approach for mixed-type data is proposed, which relies on a barycentric coding of continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Sensory Analysis and Statistical Methods · Advanced Clustering Algorithms Research
