Clustering structure for species sampling sequences with general base measure
Federico Bassetti, Lucia Ladelli

TL;DR
This paper studies the clustering behavior of species sampling sequences with general base measures, providing new stochastic representations and explicit formulas for their partition structures, with implications for Bayesian nonparametrics.
Contribution
It introduces a stochastic representation for species sampling sequences with general base measures and derives explicit EPPF expressions, advancing understanding of their clustering properties.
Findings
Derived a stochastic representation in terms of latent exchangeable partitions.
Provided explicit formulas for the EPPF of the generated partitions.
Analyzed the asymptotic behavior of the number of clusters and their sizes.
Abstract
We investigate the clustering structure of species sampling sequences , with general base measure. Such sequences are exchangeable with a species sampling random probability as directing measure. The clustering properties of these sequences are interesting for Bayesian nonparametrics applications, where mixed base measures are used, for example, to accommodate sharp hypotheses in regression problems and provide sparsity. In this paper, we prove a stochastic representation for in terms of a latent exchangeable random partition. We provide explicit expression of the EPPF of the partition generated by in terms of the EPPF of the latent partition. We investigate the asymptotic behaviour of the total number of blocks and of the number of blocks with fixed cardinality in the partition generated by .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Advanced Clustering Algorithms Research
