Model-based clustering based on sparse finite Gaussian mixtures
Gertraud Malsiner-Walli, Sylvia Fr\"uhwirth-Schnatter, Bettina, Gr\"un

TL;DR
This paper introduces a Bayesian model-based clustering method using sparse finite Gaussian mixtures that automatically determines the number of clusters and identifies relevant variables through hierarchical priors and MCMC sampling.
Contribution
It proposes a novel approach combining sparse hierarchical priors with overfitting mixtures and relabeling techniques to simultaneously estimate the number of clusters and relevant variables.
Findings
Effective in simulated data for accurate cluster number estimation.
Improves parameter estimates with normal gamma priors.
Successfully applied to benchmark datasets.
Abstract
In the framework of Bayesian model-based clustering based on a finite mixture of Gaussian distributions, we present a joint approach to estimate the number of mixture components and identify cluster-relevant variables simultaneously as well as to obtain an identified model. Our approach consists in specifying sparse hierarchical priors on the mixture weights and component means. In a deliberately overfitting mixture model the sparse prior on the weights empties superfluous components during MCMC. A straightforward estimator for the true number of components is given by the most frequent number of non-empty components visited during MCMC sampling. Specifying a shrinkage prior, namely the normal gamma prior, on the component means leads to improved parameter estimates as well as identification of cluster-relevant variables. After estimating the mixture model using MCMC methods based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
