Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage
Jean P. Zukurov, Sieberth do Nascimento-Brito, Angela C. Volpini,, Guilherme Oliveira, Luiz Mario R. Janini, Fernando Antoneli

TL;DR
This paper introduces Tanden, a novel Bayesian method leveraging deep coverage and low error rates of SOLiD sequencing to accurately estimate viral genetic diversity at the population level, even with short reads.
Contribution
The paper presents a new approach for viral diversity estimation that effectively utilizes short, high-depth reads from SOLiD sequencing through Bayesian modeling and optimized read mapping.
Findings
Effective estimation of viral diversity from short reads.
Increased accuracy in separating signal from noise.
Implementation of a comprehensive tool, Tanden.
Abstract
In this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some NGS platforms in order to estimate a population of haplotypes. Our goal here is to take advantage of distinct virtues of a certain kind of NGS platform - the platform SOLiD (Life Technologies) is an example - that has not received much attention due to the short length of its reads, which renders haplotype estimation very difficult. However, this kind of platform has a very low error rate and extremely deep coverage per site and our method is designed to take advantage of these characteristics. We propose to measure the populational genetic diversity through a family of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
