On Missing Mass Variance

Maciej Skorski

arXiv:2104.07028·cs.IT·April 16, 2021

On Missing Mass Variance

Maciej Skorski

PDF

Open Access

TL;DR

This paper investigates the maximum possible variance of the missing mass in samples, providing insights into its concentration properties across different sample and alphabet sizes.

Contribution

It determines the maximal variance of the missing mass for any sample and alphabet sizes, advancing understanding of its concentration behavior.

Findings

01

Derived the maximal variance bounds for missing mass

02

Enhanced understanding of missing mass concentration properties

03

Applicable to diverse fields like ecology, linguistics, and information theory

Abstract

The missing mass refers to the probability of elements not observed in a sample, and since the work of Good and Turing during WWII, has been studied extensively in many areas including ecology, linguistic, networks and information theory. This work determines what is the \emph{maximal variance of the missing mass}, for any sample and alphabet sizes. The result helps in understanding the missing mass concentration properties.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Bayesian Methods and Mixture Models · DNA and Biological Computing