Novel Deviation Bounds for Mixture of Independent Bernoulli Variables with Application to the Missing Mass
Bahman Yari Saeed Khanloo

TL;DR
This paper develops new distribution-free concentration inequalities for mixtures of Bernoulli variables, specifically applying them to derive sharp bounds on the missing mass, a key quantity in density estimation and learning theory.
Contribution
It introduces the first Bernstein-like large deviation bounds for missing mass, improving existing results and resolving heterogeneity issues in concentration inequalities for discrete distributions.
Findings
Derived Bernstein-like bounds with near-linear exponents for missing mass
Sharpened previous bounds for small deviations in large samples
Showed heterogeneity issues can be addressed with standard inequalities
Abstract
In this paper, we are concerned with obtaining distribution-free concentration inequalities for mixture of independent Bernoulli variables that incorporate a notion of variance. Missing mass is the total probability mass associated to the outcomes that have not been seen in a given sample which is an important quantity that connects density estimates obtained from a sample to the population for discrete distributions. Therefore, we are specifically motivated to apply our method to study the concentration of missing mass - which can be expressed as a mixture of Bernoulli - in a novel way. We not only derive - for the first time - Bernstein-like large deviation bounds for the missing mass whose exponents behave almost linearly with respect to deviation size, but also sharpen McAllester and Ortiz (2003) and Berend and Kontorovich (2013) for large sample sizes in the case of small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Diverse Scientific and Engineering Research · Advanced Statistical Methods and Models
