Better lower bounds for missing species: improved non-parametric moment-based estimation for large experiments
Timothy Daley, Andrew D Smith

TL;DR
This paper introduces a fast, non-parametric, moment-based estimator for accurately estimating the number of unobserved species in large, heterogeneous populations, with applications in genomics and single-cell analysis.
Contribution
It develops an efficient, flexible estimator extending classical lower bounds, capable of handling large-scale, heterogeneous data in modern biological experiments.
Findings
Estimator is computationally efficient for large datasets
It provides conservative estimates with increased information utilization
Applied successfully to T-Cell receptor and single-cell RNA-seq data
Abstract
Estimation of the number of species or unobserved classes from a random sample of the underlying population is a ubiquitous problem in statistics. In classical settings, the size of the sample is usually small. New technologies such as high-throughput DNA sequencing have allowed for the sampling of extremely large and heterogeneous populations at scales not previously attainable or even considered. New algorithms are required that take advantage of the size of the data to account for heterogeneity, but are also sufficiently fast and scale well with large data. We present a non-parametric moment-based estimator that is both computationally efficient and is sufficiently flexible to account for heterogeneity in the abundances of underlying population. This estimator is based on an extension of a popular moment-based lower bound (Chao, 1984), originally developed by Harris (1959) but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDrug Transport and Resistance Mechanisms · Machine Learning and Algorithms · Renal Transplantation Outcomes and Treatments
