Accelerating the Understanding of Life's Code Through Better Algorithms and Hardware Design
Mohammed H K Alser

TL;DR
This paper introduces four novel pre-alignment filtering algorithms that leverage FPGA acceleration and CPU implementation to drastically reduce the computational time of genomic sequence alignment, significantly improving efficiency in genomic analysis.
Contribution
The paper presents four new algorithms for pre-alignment filtering that are highly accurate and efficiently accelerated using FPGA hardware, with a practical CPU implementation for broad applicability.
Findings
Hardware filters achieve 1000x speedup over CPU implementations.
Integration reduces read aligner execution time by up to 21.5x.
CPU implementation of SneakySnake reduces alignment time by up to 57.9x.
Abstract
Calculating the similarities between a pair of genomic sequences is one of the most fundamental computational steps in genomic analysis. This step -- called sequence alignment -- is the computational bottleneck because: (1) it is implemented using quadratic-time dynamic programming algorithms and (2) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. In this thesis, we introduce four new algorithms (GateKeeper, Shouji, MAGNET, and SneakySnake) that function as a pre-alignment step and aim to filter out most incorrect candidate locations. The first key idea of our pre-alignment filters is to provide high filtering accuracy by correctly detecting all similar segments shared between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics
