Scaled process priors for Bayesian nonparametric estimation of the unseen genetic variation
Federico Camerlenghi, Stefano Favaro, Lorenzo Masoero, Tamara, Broderick

TL;DR
This paper introduces the stable-Beta scaled process prior for Bayesian nonparametric estimation of unseen genetic features, enabling more flexible and informative posterior inferences than traditional CRM-based models, with applications to genomic data.
Contribution
It proposes the SB-SP prior that yields a negative Binomial posterior, improving flexibility and interpretability over CRM priors in estimating unseen features.
Findings
Outperforms existing methods in estimation accuracy.
Provides better coverage for unseen feature estimation.
Maintains analytical tractability and computational efficiency.
Abstract
There is a growing interest in the estimation of the number of unseen features, mostly driven by biological applications. A recent work brought out a peculiar property of the popular completely random measures (CRMs) as prior models in Bayesian nonparametric (BNP) inference for the unseen-features problem: for fixed prior's parameters, they all lead to a Poisson posterior distribution for the number of unseen features, which depends on the sampling information only through the sample size. CRMs are thus not a flexible prior model for the unseen-features problem and, while the Poisson posterior distribution may be appealing for analytical tractability and ease of interpretability, its independence from the sampling information makes the BNP approach a questionable oversimplification, with posterior inferences being completely determined by the estimation of unknown prior's parameters. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Gene expression and cancer classification
