Estimating the number of unseen species: A bird in the hand is worth $\log n $ in the bush
Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

TL;DR
This paper introduces new estimators that provably predict the number of unseen species in a population up to a size proportional to the logarithm of the sample size, improving over previous methods with theoretical guarantees.
Contribution
The authors derive simple linear estimators that can accurately estimate unseen species up to a $ extit{log n}$ multiple of the observed sample size, with optimal mean-square error bounds.
Findings
Estimators predict unseen species up to $ extit{log n}$ times the observed sample size.
The proposed estimators outperform existing methods on synthetic and real datasets.
The approach applies uniformly across multiple sampling models and is computationally efficient.
Abstract
Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher, uses samples to predict the number of hitherto unseen species that would be observed if new samples were collected. Of considerable interest is the largest ratio between the number of new and existing samples for which can be accurately predicted. In seminal works, Good and Toulmin constructed an intriguing estimator that predicts for all , thereby showing that the number of species can be estimated for a population twice as large as that observed. Subsequently Efron and Thisted obtained a modified estimator that empirically predicts even for some , but without provable guarantees. We derive a class of estimators that predict not just for constant , but all the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · Botanical Research and Chemistry · Bayesian Methods and Mixture Models
