Estimating the size of a set using cascading exclusion

Sourav Chatterjee; Persi Diaconis; Susan Holmes

arXiv:2508.05901·math.ST·April 28, 2026

Estimating the size of a set using cascading exclusion

Sourav Chatterjee, Persi Diaconis, Susan Holmes

PDF

TL;DR

This paper introduces refined methods for estimating the size of a set using samples, bridging the gap between classical birthday problem approaches and maximum-based estimators, with broad applications.

Contribution

It develops a unified non-asymptotic theory for set size estimation, applicable to various problems including volume estimation, species discovery, and testing, with regression extensions.

Findings

01

Provides non-asymptotic error bounds for set size estimators.

02

Interpolates between birthday problem and maximum-based estimators.

03

Applies to volume estimation, species problem, and testing scenarios.

Abstract

Let $S$ be a finite set, and $X_{1}, \dots, X_{n}$ an i.i.d. uniform sample from $S$ . To estimate the size $∣ S ∣$ , without further structure, one can wait for repeats and use the birthday problem. This requires a sample size of the order $∣ S ∣^{\frac{1}{2}}$ . On the other hand, if $S = {1, 2, \dots, ∣ S ∣}$ , the maximum of the sample blown up by $n / (n - 1)$ gives an efficient estimator based on any growing sample size. This paper gives refinements that interpolate between these extremes. A general non-asymptotic theory is developed. This includes estimating the volume of a compact convex set, the unseen species problem, and a host of testing problems that follow from the question `Is this new observation a typical pick from a large prespecified population?' We also treat regression style predictors. A general theorem gives non-parametric finite $n$ error bounds in all cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.