Accelerating SARS-CoV-2 low frequency variant calling on ultra deep sequencing datasets
Bryce Kille, Yunxi Liu, Nicolae Sapoval, Michael Nute, Lawrence, Rauchwerger, Nancy Amato, Todd J. Treangen

TL;DR
This paper improves the LoFreq algorithm for detecting low-frequency SARS-CoV-2 variants in ultra-deep sequencing data by enhancing its speed and parallel processing capabilities, facilitating more efficient analysis of high-coverage genomes.
Contribution
The paper introduces specific modifications to LoFreq that significantly accelerate its runtime and simplify its parallel execution, enabling better handling of ultra-deep sequencing datasets.
Findings
Enhanced runtime performance of LoFreq
Simplified multithreading and cluster distribution
Improved detection of low-frequency variants
Abstract
With recent advances in sequencing technology it has become affordable and practical to sequence genomes to very high depth-of-coverage, allowing researchers to discover low-frequency variants in the genome. However, due to the errors in sequencing it is an active area of research to develop algorithms that can separate noise from the true variants. LoFreq is a state of the art algorithm for low-frequency variant detection but has a relatively long runtime compared to other tools. In addition to this, the interface for running in parallel could be simplified, allowing for multithreading as well as distributing jobs to a cluster. In this work we describe some specific contributions to LoFreq that remedy these issues.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics
