A read-filtering algorithm for high-throughput marker-gene studies that greatly improves OTU accuracy
Fernando Puente-S\'anchez, Jacobo Aguirre, V\'ictor Parro

TL;DR
This paper introduces a novel read-filtering algorithm that uses error-probability calculations to improve OTU accuracy in high-throughput marker-gene studies, leading to more faithful microbial diversity estimates.
Contribution
The paper presents a new, sensitive filtering method that outperforms existing approaches by retaining more reads and producing more accurate OTUs.
Findings
Retained more reads than previous methods
Produced fewer, more accurate OTUs
Enhanced representation of microbial diversity
Abstract
Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational Taxonomic Units (OTUs) and therefore leading to the overestimation of microbial diversity. Sequencing errors will also result in OTUs that are not accurate reconstructions of the original biological sequences. Herein we present a novel and sensitive sequence filtering algorithm that minimizes both problems by calculating the exact error-probability distribution of a sequence from its quality scores. In order to validate our method, we quality-filtered thirty-seven publicly available datasets obtained by sequencing mock and environmental microbial communities with the Roche 454, Illumina MiSeq and IonTorrent PGM platforms, and compared our results to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Gene expression and cancer classification
