Inconsistency of Pitman-Yor process mixtures for the number of components
Jeffrey W. Miller, Matthew T. Harrison

TL;DR
This paper demonstrates that using Pitman-Yor process mixtures to infer the number of components in finite mixtures is inconsistent, meaning it does not reliably identify the true number of components from data.
Contribution
It proves that the posterior distribution over the number of components in Pitman-Yor process mixtures is inconsistent for finite mixture data, across various families of distributions.
Findings
Posterior does not concentrate at the true number of components.
Inconsistency applies to a wide class of nonparametric mixtures.
Results hold for discrete and continuous exponential family distributions.
Abstract
In many applications, a finite mixture is a natural model, but it can be difficult to choose an appropriate number of components. To circumvent this choice, investigators are increasingly turning to Dirichlet process mixtures (DPMs), and Pitman-Yor process mixtures (PYMs), more generally. While these models may be well-suited for Bayesian density estimation, many investigators are using them for inferences about the number of components, by considering the posterior on the number of components represented in the observed data. We show that this posterior is not consistent --- that is, on data from a finite mixture, it does not concentrate at the true number of components. This result applies to a large class of nonparametric mixtures, including DPMs and PYMs, over a wide variety of families of component distributions, including essentially all discrete families, as well as continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Stochastic processes and statistical mechanics · Diffusion and Search Dynamics
