Rapid Sequence Identification of Potential Pathogens Using Techniques   from Sparse Linear Algebra

Stephanie Dodson; Darrell O. Ricke; Jeremy Kepner; Nelson Chiu; and; Anna Shcherbina

arXiv:1501.05353·q-bio.QM·April 13, 2017

Rapid Sequence Identification of Potential Pathogens Using Techniques from Sparse Linear Algebra

Stephanie Dodson, Darrell O. Ricke, Jeremy Kepner, Nelson Chiu, and, Anna Shcherbina

PDF

TL;DR

The paper introduces D4RAGenS, a fast and accurate genetic sequence identification algorithm that uses linear algebra and subsampling to handle large genomic datasets efficiently, with applications in biodefense and diagnostics.

Contribution

It presents a novel sequence identification method leveraging D4M and linear algebra, offering two modes for speed and accuracy tradeoffs, suitable for large-scale genomic data analysis.

Findings

01

Demonstrates high accuracy in pathogen identification

02

Achieves significant speed improvements over existing methods

03

Validated on datasets from DTRA contest

Abstract

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D $^{4}$ RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling and computational power of the Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear algebra and statistical properties to increase computational performance while retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield speed and precision tradeoffs, with applications in biodefense and medical diagnostics. The D $^{4}$ RAGenS analysis algorithm is tested over several datasets, including three utilized for the Defense Threat Reduction Agency (DTRA) metagenomic algorithm contest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.