Concentration of the missing mass in metric spaces

Andreas Maurer

arXiv:2206.02012·math.ST·November 23, 2022

Concentration of the missing mass in metric spaces

Andreas Maurer

PDF

Open Access

TL;DR

This paper investigates the estimation and concentration of the probability of observing data beyond a certain distance from an iid sample in metric spaces, extending classical missing mass problems to continuous settings.

Contribution

It introduces estimators for the conditional missing mass, analyzes their concentration properties, and identifies conditions under which classical estimators like Good-Turing are effective.

Findings

01

Good-Turing estimator concentrates under certain conditions

02

Estimation of expected missing mass is generally challenging

03

Applications include anomaly detection and Wasserstein distance analysis

Abstract

We study the estimation and concentration on its expectation of the probability to observe data further than a specified distance from a given iid sample in a metric space. The problem extends the classical problem of estimation of the missing mass in discrete spaces. We give some estimators for the conditional missing mass and show that estimation of the expected missing mass is difficult in general. Conditions on the distribution, under which the Good-Turing estimator and the conditional missing mass concentrate on their expectations are identified. Applications to anomaly detection, coding, the Wasserstein distance between true and empirical measure and simple learning bounds are sketched.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData-Driven Disease Surveillance · Statistical Methods and Inference · Advanced Statistical Methods and Models