Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications
Anna Ben-Hamou, St\'ephane Boucheron, Mesrob I. Ohannessian

TL;DR
This paper establishes Bernstein-type concentration inequalities for occupancy counts and missing mass in an infinite urn scheme, with tight variance bounds under regular variation, enabling improved confidence intervals for the Good–Turing estimator.
Contribution
It provides the first concentration inequalities for occupancy counts and missing mass without distribution assumptions, with tight variance bounds under regular variation, and applications to confidence intervals.
Findings
Concentration inequalities hold for occupancy counts and missing mass.
Variance bounds are tight under regular variation.
Enables accurate confidence intervals for the Good–Turing estimator.
Abstract
An infinite urn scheme is defined by a probability mass function over positive integers. A random allocation consists of a sample of independent drawings according to this probability distribution where may be deterministic or Poisson-distributed. This paper is concerned with occupancy counts, that is with the number of symbols with or at least occurrences in the sample, and with the missing mass that is the total probability of all symbols that do not occur in the sample. Without any further assumption on the sampling distribution, these random quantities are shown to satisfy Bernstein-type concentration inequalities. The variance factors in these concentration inequalities are shown to be tight if the sampling distribution satisfies a regular variation property. This regular variation property reads as follows. Let the number of symbols with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
