Concentration of the missing mass in metric spaces
Andreas Maurer

TL;DR
This paper investigates the estimation and concentration of the probability of observing data beyond a certain distance from an iid sample in metric spaces, extending classical missing mass problems to continuous settings.
Contribution
It introduces estimators for the conditional missing mass, analyzes their concentration properties, and identifies conditions under which classical estimators like Good-Turing are effective.
Findings
Good-Turing estimator concentrates under certain conditions
Estimation of expected missing mass is generally challenging
Applications include anomaly detection and Wasserstein distance analysis
Abstract
We study the estimation and concentration on its expectation of the probability to observe data further than a specified distance from a given iid sample in a metric space. The problem extends the classical problem of estimation of the missing mass in discrete spaces. We give some estimators for the conditional missing mass and show that estimation of the expected missing mass is difficult in general. Conditions on the distribution, under which the Good-Turing estimator and the conditional missing mass concentrate on their expectations are identified. Applications to anomaly detection, coding, the Wasserstein distance between true and empirical measure and simple learning bounds are sketched.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Statistical Methods and Inference · Advanced Statistical Methods and Models
