Graphical model-based clustering of categorical data
Laura Ferrini, Federico Castelletti

TL;DR
This paper introduces a Bayesian clustering method for multivariate categorical data that uses graphical models to explicitly account for variable dependencies within clusters, improving over traditional independence assumptions.
Contribution
It proposes a Dirichlet Process mixture model of categorical graphical models, enabling clustering based on dependence structures and providing full Bayesian inference with MCMC algorithms.
Findings
Graphical model-based clustering outperforms independence-based methods.
The approach effectively captures dependence structures in real datasets.
Simulation studies validate the method's accuracy and robustness.
Abstract
Clustering multivariate data is a pervasive task in many applied problems, particularly in social studies and life science. Model-based approaches to clustering rely on mixture models, where each mixture component corresponds to the kernel of a distribution characterizing a latent sub-group. Current methods developed within this framework employ multivariate distributions built under the assumption of independence among variables given the cluster allocation. Accordingly, possible dependence structures characterizing differences across groups are not directly accounted for during the clustering process. In this paper we consider multivariate categorical data, and introduce a model-based clustering method which employs graphical models as a tool to encode dependencies between variables. Specifically, we consider a Dirichlet Process mixture of categorical graphical models, which clusters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Advanced Clustering Algorithms Research
