Extrapolating the profile of a finite population
Soham Jana, Yury Polyanskiy, Yihong Wu

TL;DR
This paper demonstrates that it is possible to consistently estimate the population profile, which is the distribution of types, from a small subsample in the sublinear regime, using a linear programming approach and complex analysis.
Contribution
It introduces a method to estimate the population profile in the sublinear sampling regime and characterizes its minimax optimality via an infinite-dimensional LP.
Findings
Consistent profile estimation is possible when sample size exceeds k / log k.
The optimal convergence rate in the linear regime is Theta(1 / log k).
A single infinite-dimensional LP characterizes the estimator's risk and optimality.
Abstract
We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of individuals each belonging to one of types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size . Nevertheless, we show that in the sublinear regime of , it is possible to consistently estimate in total variation the \emph{profile} of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of for any constant the optimal rate is . Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size . We show that there is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Machine Learning and Algorithms
