Optimal estimation of high-order missing masses, and the rare-type match problem
Stefano Favaro, Zacharie Naulet

TL;DR
This paper develops optimal nonparametric estimators for high-order missing masses in discrete distributions, analyzes their minimax properties, and applies these results to forensic likelihood ratio estimation.
Contribution
It introduces a new estimator for the r-order missing mass, establishes conditions for its minimax optimality, and connects the estimation to forensic likelihood ratio problems.
Findings
The estimator is consistent under certain tail conditions.
Minimax estimation is impossible over all discrete distributions.
The estimator achieves minimax optimality under stronger tail assumptions.
Abstract
Consider a random sample from an unknown discrete distribution on a countable alphabet , and let be the empirical frequencies of distinct symbols 's in the sample. We consider the problem of estimating the -order missing mass, which is a discrete functional of defined as This is generalization of the missing mass whose estimation is a classical problem in statistics, being the subject of numerous studies both in theory and methods. First, we introduce a nonparametric estimator of and a corresponding non-asymptotic confidence interval through concentration properties of . Then, we investigate minimax estimation of , which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Random Matrices and Applications
