Missing $g$-mass: Investigating the Missing Parts of Distributions

Prafulla Chandra; Andrew Thangaraj

arXiv:2110.01968·math.ST·May 30, 2023

Missing $g$-mass: Investigating the Missing Parts of Distributions

Prafulla Chandra, Andrew Thangaraj

PDF

Open Access

TL;DR

This paper introduces the concept of missing g-mass to analyze unobserved parts of large distributions, providing new estimation techniques and concentration bounds for various functions of the missing distribution.

Contribution

It defines missing g-mass, studies minimax estimation for order-alpha missing mass, and develops new concentration bounds including strongly sub-Gamma and filtered sub-Gaussian types.

Findings

01

Exact minimax convergence rates for order-alpha missing mass.

02

Sub-Gaussian tail bounds with near-optimal variance factors.

03

Introduction of strongly sub-Gamma and filtered sub-Gaussian concentration notions.

Abstract

Estimating the underlying distribution from \textit{iid} samples is a classical and important problem in statistics. When the alphabet size is large compared to number of samples, a portion of the distribution is highly likely to be unobserved or sparsely observed. The missing mass, defined as the sum of probabilities $Pr (x)$ over the missing letters $x$ , and the Good-Turing estimator for missing mass have been important tools in large-alphabet distribution estimation. In this article, given a positive function $g$ from $[0, 1]$ to the reals, the missing $g$ -mass, defined as the sum of $g (Pr (x))$ over the missing letters $x$ , is introduced and studied. The missing $g$ -mass can be used to investigate the structure of the missing part of the distribution. Specific applications for special cases such as order- $α$ missing mass ( $g (p) = p^{α}$ ) and the missing Shannon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Mechanics and Entropy · Machine Learning and Algorithms