Accelerating the Understanding of Life's Code Through Better Algorithms   and Hardware Design

Mohammed H K Alser

arXiv:1910.03936·q-bio.GN·October 10, 2019

Accelerating the Understanding of Life's Code Through Better Algorithms and Hardware Design

Mohammed H K Alser

PDF

Open Access

TL;DR

This paper introduces four novel pre-alignment filtering algorithms that leverage FPGA acceleration and CPU implementation to drastically reduce the computational time of genomic sequence alignment, significantly improving efficiency in genomic analysis.

Contribution

The paper presents four new algorithms for pre-alignment filtering that are highly accurate and efficiently accelerated using FPGA hardware, with a practical CPU implementation for broad applicability.

Findings

01

Hardware filters achieve 1000x speedup over CPU implementations.

02

Integration reduces read aligner execution time by up to 21.5x.

03

CPU implementation of SneakySnake reduces alignment time by up to 57.9x.

Abstract

Calculating the similarities between a pair of genomic sequences is one of the most fundamental computational steps in genomic analysis. This step -- called sequence alignment -- is the computational bottleneck because: (1) it is implemented using quadratic-time dynamic programming algorithms and (2) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. In this thesis, we introduce four new algorithms (GateKeeper, Shouji, MAGNET, and SneakySnake) that function as a pre-alignment step and aim to filter out most incorrect candidate locations. The first key idea of our pre-alignment filters is to provide high filtering accuracy by correctly detecting all similar segments shared between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics