On the Impossibility of Learning the Missing Mass

Elchanan Mossel; Mesrob I. Ohannessian

arXiv:1503.03613·stat.ML·March 13, 2015

On the Impossibility of Learning the Missing Mass

Elchanan Mossel, Mesrob I. Ohannessian

PDF

TL;DR

This paper proves that accurately predicting the probability of rare, unseen events (missing mass) in discrete distributions is impossible without additional structural assumptions, highlighting the need for heavy-tailed distributions.

Contribution

It establishes the fundamental impossibility of distribution-free PAC learning of missing mass in relative error, formalizing the necessity of heavy tails for predicting rare events.

Findings

01

Missing mass is not distribution-free PAC-learnable in relative error.

02

Predicting rare events requires assumptions about distribution tails.

03

Heavy-tailed distributions are necessary for learning the probability of unseen outcomes.

Abstract

This paper shows that one cannot learn the probability of rare events without imposing further structural assumptions. The event of interest is that of obtaining an outcome outside the coverage of an i.i.d. sample from a discrete distribution. The probability of this event is referred to as the "missing mass". The impossibility result can then be stated as: the missing mass is not distribution-free PAC-learnable in relative error. The proof is semi-constructive and relies on a coupling argument using a dithered geometric distribution. This result formalizes the folklore that in order to predict rare events, one necessarily needs distributions with "heavy tails".

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.