Frequency of Frequencies Distributions and Size Dependent Exchangeable Random Partitions
Mingyuan Zhou, Stefano Favaro, Stephen G Walker

TL;DR
This paper introduces a flexible probabilistic model for the frequency of frequencies distribution that depends on population size, enabling better modeling of data with size-dependent clustering, demonstrated on real datasets.
Contribution
It proposes a novel cluster structure based on completely random measures, allowing size-dependent exchangeable partitions and FoF modeling, with a new Gibbs sampling method for inference.
Findings
Model effectively captures size-dependent clustering patterns.
Demonstrates improved fit on real text, genomic, and survey data.
Provides a practical Gibbs sampling algorithm for FoF extrapolation.
Abstract
Motivated by the fundamental problem of modeling the frequency of frequencies (FoF) distribution, this paper introduces the concept of a cluster structure to define a probability function that governs the joint distribution of a random count and its exchangeable random partitions. A cluster structure, naturally arising from a completely random measure mixed Poisson process, allows the probability distribution of the random partitions of a subset of a population to be dependent on the population size, a distinct and motivated feature that makes it more flexible than a partition structure. This allows it to model an entire FoF distribution whose structural properties change as the population size varies. A FoF vector can be simulated by drawing an infinite number of Poisson random variables, or by a stick-breaking construction with a finite random number of steps. A generalized negative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Stochastic processes and statistical mechanics · Algorithms and Data Compression
