Estimating Diversity via Frequency Ratios
A. Willis, J. Bunge

TL;DR
This paper introduces a novel nonlinear regression approach based on probability ratios to estimate total diversity in populations, especially effective for high-diversity datasets like microbial sequencing.
Contribution
It presents the first method to move beyond classical mixed Poisson models by using ratio-based regression for diversity estimation.
Findings
Accurately estimates total diversity in high-diversity datasets
Outperforms existing methods in microbial ecology applications
Provides good data fit with reasonable standard errors
Abstract
We wish to estimate the total number of classes in a population based on sample counts, especially in the presence of high latent diversity. Drawing on probability theory that characterizes distributions on the integers by ratios of consecutive probabilities, we construct a nonlinear regression model for the ratios of consecutive frequency counts. This allows us to predict the unobserved count and hence estimate the total diversity. We believe that this is the first approach to depart from the classical mixed Poisson model in this problem. Our method is geometrically intuitive and yields good fits to data with reasonable standard errors. It is especially well-suited to analyzing high diversity datasets derived from next-generation sequencing in microbial ecology. We demonstrate the method's performance in this context and via simulation, and we present a dataset for which our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · Statistical Methods and Bayesian Inference · HIV, Drug Use, Sexual Risk
