# Extremes and gaps in sampling from a GEM random discrete distribution

**Authors:** Jim Pitman, Yuri Yakubovich

arXiv: 1701.06294 · 2017-01-24

## TL;DR

This paper characterizes the distribution of gaps and maximum values in samples from GEM distributions, revealing new probabilistic structures and extending known formulas for sampling statistics.

## Contribution

It extends known results for GEM distributions by describing the distribution of all gaps and the maximum, and analyzes the growth and limit distribution of the maximum in two-parameter GEM models.

## Key findings

- Gaps between order statistics are independent geometric variables.
- Maximum sample value grows like θ log(n) with asymptotic normality.
- Derived new formulas for sampling statistics in GEM distributions.

## Abstract

We show that in a sample of size $n$ from a GEM$(0,\theta)$ random discrete distribution, the gaps $G_{i:n}:= X_{n-i+1:n} - X_{n-i:n}$ between order statistics $X_{1:n} \le \cdots \le X_{n:n}$ of the sample, with the convention $G_{n:n} := X_{1:n} - 1$, are distributed like the first $n$ terms of an infinite sequence of independent geometric$(i/(i+\theta))$ variables $G_i$. This extends a known result for the minimum $X_{1:n}$ to other gaps in the range of the sample, and implies that the maximum $X_{n:n}$ has the distribution of $1 + \sum_{i=1}^n G_i$, hence the known result that $X_{n:n}$ grows like $\theta\log(n)$ as $n\to\infty$, with an asymptotically normal distribution. Other consequences include most known formulas for the exact distributions of GEM$(0,\theta)$ sampling statistics, including the Ewens and Donnelly--Tavar\'e sampling formulas. For the two-parameter GEM$(\alpha,\theta)$ distribution we show that the maximal value grows like a random multiple of $n^{\alpha/(1-\alpha)}$ and find the limit distribution of the multiplier.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.06294/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/1701.06294/full.md

---
Source: https://tomesphere.com/paper/1701.06294