Clustering Approaches for Mixed-Type Data: A Comparative Study
Badih Ghattas, Alvaro Sanchez San-Benito

TL;DR
This paper compares various clustering methods for mixed-type data, analyzing their performance across different scenarios to identify the most effective approaches and understand their limitations.
Contribution
It provides a comprehensive comparison of state-of-the-art clustering methods for mixed-type data, highlighting their strengths and weaknesses in diverse conditions.
Findings
KAMILA, LCM, and k-prototypes performed best in ARI.
Performance is significantly affected by cluster overlap, variable proportion, and sample size.
None of the methods excel when variables strongly interact with cluster membership.
Abstract
Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares them using various simulation models. The compared methods include the distance-based approaches k-prototypes, PDQ, and convex k-means, and the probabilistic methods KAy-means for MIxed LArge data (KAMILA), the mixture of Bayesian networks (MBNs), and latent class model (LCM). The aim is to provide insights into the behavior of different methods across a wide range of scenarios by varying some experimental factors such as the number of clusters, cluster overlap, sample size, dimension, proportion of continuous variables in the dataset, and clusters' distribution. The degree of cluster overlap and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Statistical Methods and Bayesian Inference
