Non-Bayesian Parametric Missing-Mass Estimation
Shir Cohen, Tirza Routtenberg, and Lang Tong

TL;DR
This paper introduces a new non-Bayesian framework and lower bound for missing-mass estimation, improving the evaluation and performance of estimators like CML, Good-Turing, and Laplace.
Contribution
It develops a non-Bayesian CCRB-type bound for missing-mass estimation and proposes an iterative Fisher scoring method to enhance existing estimators.
Findings
The mmCCRB provides a valid lower bound on estimator performance.
The Fisher scoring method improves the Laplace estimator.
Numerical results confirm the bound's effectiveness.
Abstract
We consider the classical problem of missing-mass estimation, which deals with estimating the total probability of unseen elements in a sample. The missing-mass estimation problem has various applications in machine learning, statistics, language processing, ecology, sensor networks, and others. The naive, constrained maximum likelihood (CML) estimator is inappropriate for this problem since it tends to overestimate the probability of the observed elements. Similarly, the conventional constrained Cramer-Rao bound (CCRB), which is a lower bound on the mean-squared-error (MSE) of unbiased estimators, does not provide a relevant bound on the performance for this problem. In this paper, we introduce a frequentist, non-Bayesian parametric model of the problem of missing-mass estimation. We introduce the concept of missing-mass unbiasedness by using the Lehmann unbiasedness definition. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
