Fast Generation of Exchangeable Sequence of Clusters Data

Keith Levin; Brenda Betancourt

arXiv:2209.02844·math.ST·September 8, 2022

Fast Generation of Exchangeable Sequence of Clusters Data

Keith Levin, Brenda Betancourt

PDF

Open Access

TL;DR

This paper introduces faster methods for generating exchangeable sequence of clusters data in Bayesian models, providing closed-form expressions and analytical results that improve sampling efficiency and understanding of cluster distributions.

Contribution

It develops new, faster sampling algorithms for ESC models and derives the first analytical expressions for the distribution of the number of clusters.

Findings

01

Closed-form expressions for certain ESC models

02

Faster sampling methods compared to previous approaches

03

Analytical distribution of the number of clusters

Abstract

Recent advances in Bayesian models for random partitions have led to the formulation and exploration of Exchangeable Sequences of Clusters (ESC) models. Under ESC models, it is the cluster sizes that are exchangeable, rather than the observations themselves. This property is particularly useful for obtaining microclustering behavior, whereby cluster sizes grow sublinearly in the number of observations, as is common in applications such as record linkage, sparse networks and genomics. Unfortunately, the exchangeable clusters property comes at the cost of projectivity. As a consequence, in contrast to more traditional Dirichlet Process or Pitman-Yor process mixture models, samples a priori from ESC models cannot be easily obtained in a sequential fashion and instead require the use of rejection or importance sampling. In this work, drawing on connections between ESC models and discrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Census and Population Estimation · Statistical Methods and Bayesian Inference