Species richness estimation with high diversity but spurious singletons
Amy Willis

TL;DR
This paper introduces breakaway_nof1, a new species richness estimator that predicts true singletons and unobserved taxa from non-singleton data, improving robustness in high-diversity microbial samples.
Contribution
The novel method estimates true singletons and unobserved taxa using non-singleton counts, enhancing accuracy and robustness in species richness estimation.
Findings
Predicts true singletons and unobserved taxa from non-singleton data
Provides a robustness check for richness estimates against quality control
Available as an R package on CRAN
Abstract
The presence of uncommon taxa in high-throughput sequenced ecological samples pose challenges to the microbial ecologist, bioinformatician and statistician. It is rarely certain whether these taxa are truly present in the sample or the result of sequencing errors. Unfortunately, alpha-diversity quantification relies on accurate frequency counts, which can rarely be guaranteed. We present a species richness estimation tool which predicts both the number of unobserved taxa and the number of true singletons based on the non-singleton frequency counts. This method can be treated as either inferential (for formally estimating richness) or exploratory (for assessing robustness of the richness estimate to the singleton count). If the estimate, called breakaway_nof1, is comparable to other richness estimators, this provides evidence that the richness estimate is robust to the level of quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Genomics and Phylogenetic Studies · Bacterial Identification and Susceptibility Testing
