On the Concentration of the Missing Mass
Daniel Berend, Aryeh Kontorovich

TL;DR
This paper refines bounds on the probability that the missing mass, the probability of unseen points in a sample from a discrete distribution, deviates significantly, improving theoretical understanding of this key concept.
Contribution
It sharpens and simplifies existing bounds on the missing mass deviations and provides a rigorous proof of a fundamental inequality related to it.
Findings
Improved bounds on the probability of large deviations of the missing mass
Simplified proofs of existing results
Refined and rigorously proved a key inequality
Abstract
A random variable is sampled from a discrete distribution. The missing mass is the probability of the set of points not observed in the sample. We sharpen and simplify McAllester and Ortiz's results (JMLR, 2003) bounding the probability of large deviations of the missing mass. Along the way, we refine and rigorously prove a fundamental inequality of Kearns and Saul (UAI, 1998).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Bayesian Methods and Mixture Models · Statistical Methods and Inference
