Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems
Vasimuddin Md, Sanchit Misra, Heng Li, and Srinivas Aluru

TL;DR
This paper presents an architecture-aware optimization of BWA-MEM, significantly accelerating its performance on multicore processors while maintaining output accuracy, thus enabling faster genomic data analysis.
Contribution
The authors developed a set of performance optimizations for BWA-MEM kernels, achieving up to 3.5x speedup on single-threaded execution on multicore systems.
Findings
Nearly 2x, 183x, and 8x speedups on three key kernels.
Up to 3.5x and 2.4x overall speedup on end-to-end runtime.
Highest reported speedup over BWA-MEM to date.
Abstract
Innovations in Next-Generation Sequencing are enabling generation of DNA sequence data at ever faster rates and at very low cost. Large sequencing centers typically employ hundreds of such systems. Such high-throughput and low-cost generation of data underscores the need for commensurate acceleration in downstream computational analysis of the sequencing data. A fundamental step in downstream analysis is mapping of the reads to a long reference DNA sequence, such as a reference human genome. Sequence mapping is a compute-intensive step that accounts for more than 30% of the overall time of the GATK workflow. BWA-MEM is one of the most widely used tools for sequence mapping and has tens of thousands of users. In this work, we focus on accelerating BWA-MEM through an efficient architecture aware implementation, while maintaining identical output. The volume of data requires distributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Network Packet Processing and Optimization
