Fast Generation of Exchangeable Sequence of Clusters Data
Keith Levin, Brenda Betancourt

TL;DR
This paper introduces faster methods for generating exchangeable sequence of clusters data in Bayesian models, providing closed-form expressions and analytical results that improve sampling efficiency and understanding of cluster distributions.
Contribution
It develops new, faster sampling algorithms for ESC models and derives the first analytical expressions for the distribution of the number of clusters.
Findings
Closed-form expressions for certain ESC models
Faster sampling methods compared to previous approaches
Analytical distribution of the number of clusters
Abstract
Recent advances in Bayesian models for random partitions have led to the formulation and exploration of Exchangeable Sequences of Clusters (ESC) models. Under ESC models, it is the cluster sizes that are exchangeable, rather than the observations themselves. This property is particularly useful for obtaining microclustering behavior, whereby cluster sizes grow sublinearly in the number of observations, as is common in applications such as record linkage, sparse networks and genomics. Unfortunately, the exchangeable clusters property comes at the cost of projectivity. As a consequence, in contrast to more traditional Dirichlet Process or Pitman-Yor process mixture models, samples a priori from ESC models cannot be easily obtained in a sequential fashion and instead require the use of rejection or importance sampling. In this work, drawing on connections between ESC models and discrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Census and Population Estimation · Statistical Methods and Bayesian Inference
