Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis
Jan Greve, Bettina Gr\"un, Gertraud Malsiner-Walli, and Sylvia, Fr\"uhwirth-Schnatter

TL;DR
This paper introduces a computational approach to characterize the prior distribution on data partitions in Bayesian clustering models, aiding in prior elicitation and understanding of model behavior.
Contribution
It provides a method to compute descriptive statistics of the prior on partitions for Bayesian mixture models, including enumeration of data clusters and moments of partition statistics.
Findings
Efficient enumeration of the prior on the number of clusters.
Calculation of first two moments of partition statistics.
Implementation available in R package 'fipp'.
Abstract
Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work introduces an approach to compute descriptive statistics of the prior on the partitions for three selected Bayesian mixture models developed in the areas of Bayesian finite mixtures and Bayesian nonparametrics. The proposed methodology involves computationally efficient enumeration of the prior on the number of clusters in-sample (termed as ``data clusters'') and determining the first two prior moments of symmetric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
