Bayesian mixture models (in)consistency for the number of clusters
Louise Alamichel, Daria Bystrova, Julyan Arbel, Guillaume Kon Kam, King

TL;DR
This paper investigates the inconsistency of Bayesian nonparametric mixture models in estimating the true number of clusters, extending known results to a broader class of processes and proposing a post-processing solution for consistency.
Contribution
It extends the analysis of cluster number inconsistency to Gibbs-type and related processes and introduces a post-processing algorithm for consistent estimation.
Findings
Bayesian nonparametric models are inconsistent for finite true clusters.
The inconsistency extends to Gibbs-type and related processes.
A post-processing algorithm can achieve consistent cluster number estimation.
Abstract
Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, recent results proved posterior inconsistency of the number of clusters when the true number of components is finite, for the Dirichlet process and Pitman--Yor process mixture models. We extend these results to additional Bayesian nonparametric priors such as Gibbs-type processes and finite-dimensional representations thereof. The latter include the Dirichlet multinomial process, the recently proposed Pitman-Yor, and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a post-processing algorithm introduced for the Dirichlet process can be extended to more general models and provides a consistent method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Census and Population Estimation · Statistical Methods and Bayesian Inference
