Bayesian nonparametric modelling of sequential discoveries
Alessandro Zito, Tommaso Rigon, Otso Ovaskainen, David Dunson

TL;DR
This paper introduces a flexible Bayesian nonparametric approach for modeling sequential discoveries of new entities, such as species or words, with applications demonstrated on biodiversity data.
Contribution
It proposes a novel Bayesian nonparametric method for species sampling that directly models the probability of new discoveries, including covariate integration.
Findings
The method exhibits desirable asymptotic and finite sample properties.
Special tractable cases enable efficient computation.
Application to biodiversity data demonstrates practical utility.
Abstract
We aim at modelling the appearance of distinct tags in a sequence of labelled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarised via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects. We propose a novel Bayesian nonparametric method for species sampling modelling by directly specifying the probability of a new discovery, therefore allowing for flexible specifications. The asymptotic behavior and finite sample properties of such an approach are extensively studied. Interestingly, our enlarged class of sequential processes includes highly tractable special cases. We present a subclass of models characterized by appealing theoretical and computational properties. Moreover, due to strong connections with logistic regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Sensory Analysis and Statistical Methods · Fermentation and Sensory Analysis
