How many data clusters are in the Galaxy data set? Bayesian cluster analysis in action
Bettina Gr\"un, Gertraud Malsiner-Walli, Sylvia Fr\"uhwirth-Schnatter

TL;DR
This paper investigates how different prior assumptions in Bayesian mixture models influence the estimated number of clusters in the Galaxy data set, providing guidance for more effective and transparent Bayesian clustering.
Contribution
It performs a comprehensive sensitivity analysis of prior specifications in Bayesian mixture models, clarifying their impact on clustering results and offering practical recommendations.
Findings
Prior choices significantly affect the estimated number of clusters.
Certain prior specifications lead to sparser, more interpretable clustering solutions.
Simulation results support recommended prior settings for Bayesian clustering.
Abstract
In model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Aitkin (2001) compares maximum likelihood and Bayesian analyses of the Galaxy data set and expresses reservations about the Bayesian approach due to the fact that the prior assumptions imposed remain rather obscure while playing a major role in the results obtained and conclusions drawn. The aim of the paper is to address Aitkin's concerns about the Bayesian approach by shedding light on how the specified priors influence the number of estimated clusters. We perform a sensitivity analysis of different prior specifications for the mixtures of finite mixture model, i.e., the mixture model where a prior on the number of components is included. We use an extensive set of different prior specifications in a full factorial design and assess their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Soil Geostatistics and Mapping
