A read-filtering algorithm for high-throughput marker-gene studies that   greatly improves OTU accuracy

Fernando Puente-S\'anchez; Jacobo Aguirre; V\'ictor Parro

arXiv:1506.00453·q-bio.QM·June 2, 2015

A read-filtering algorithm for high-throughput marker-gene studies that greatly improves OTU accuracy

Fernando Puente-S\'anchez, Jacobo Aguirre, V\'ictor Parro

PDF

Open Access

TL;DR

This paper introduces a novel read-filtering algorithm that uses error-probability calculations to improve OTU accuracy in high-throughput marker-gene studies, leading to more faithful microbial diversity estimates.

Contribution

The paper presents a new, sensitive filtering method that outperforms existing approaches by retaining more reads and producing more accurate OTUs.

Findings

01

Retained more reads than previous methods

02

Produced fewer, more accurate OTUs

03

Enhanced representation of microbial diversity

Abstract

Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational Taxonomic Units (OTUs) and therefore leading to the overestimation of microbial diversity. Sequencing errors will also result in OTUs that are not accurate reconstructions of the original biological sequences. Herein we present a novel and sensitive sequence filtering algorithm that minimizes both problems by calculating the exact error-probability distribution of a sequence from its quality scores. In order to validate our method, we quality-filtered thirty-seven publicly available datasets obtained by sequencing mock and environmental microbial communities with the Roche 454, Illumina MiSeq and IonTorrent PGM platforms, and compared our results to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Gene expression and cancer classification