Model-based clustering of multivariate binary data with dimension reduction
Michio Yamamoto, Kenichi Hayashi

TL;DR
This paper introduces a novel model-based clustering method for multivariate binary data that simultaneously identifies the optimal cluster structure and the low-dimensional subspace, enhancing interpretability and stability.
Contribution
It extends latent class analysis by incorporating dimension reduction with sparsity, and develops an EM algorithm for efficient optimization.
Findings
Effective in simulation studies
Successfully applied to real datasets
Provides interpretable low-dimensional representations
Abstract
Clustering methods with dimension reduction have been receiving considerable wide interest in statistics lately and a lot of methods to simultaneously perform clustering and dimension reduction have been proposed. This work presents a novel procedure for simultaneously determining the optimal cluster structure for multivariate binary data and the subspace to represent that cluster structure. The method is based on a finite mixture model of multivariate Bernoulli distributions, and each component is assumed to have a low-dimensional representation of the cluster structure. This method can be considered an extension of the traditional latent class analysis model. Sparsity is introduced to the loading values, which produces the low-dimensional subspace, for enhanced interpretability and more stable extraction of the subspace. An EM-based algorithm is developed to efficiently solve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Advanced Clustering Algorithms Research
