DSA: Scalable Distributed Sequence Alignment System Using SIMD   Instructions

Bo Xu; Changlong Li; Hang Zhuang; Jiali Wang; Qingfeng Wang; Jinhong; Zhou; Xuehai Zhou

arXiv:1701.01575·cs.DC·January 9, 2017

DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions

Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong, Zhou, Xuehai Zhou

PDF

Open Access

TL;DR

DSA is a scalable distributed sequence alignment system that combines Spark and SIMD instructions to significantly accelerate bioinformatics sequence alignment tasks in distributed environments.

Contribution

The paper introduces DSA, a novel system that integrates Spark with SIMD-based data parallelism for efficient, scalable sequence alignment.

Findings

01

Achieves up to 201x speedup over SparkSW.

02

Exhibits near linear scalability with increasing cluster nodes.

03

Demonstrates outstanding performance in distributed sequence alignment.

Abstract

Sequence alignment algorithms are a basic and critical component of many bioinformatics fields. With rapid development of sequencing technology, the fast growing reference database volumes and longer length of query sequence become new challenges for sequence alignment. However, the algorithm is prohibitively high in terms of time and space complexity. In this paper, we present DSA, a scalable distributed sequence alignment system that employs Spark to process sequences data in a horizontally scalable distributed environment, and leverages data parallel strategy based on Single Instruction Multiple Data (SIMD) instruction to parallelize the algorithm in each core of worker node. The experimental results demonstrate that 1) DSA has outstanding performance and achieves up to 201x speedup over SparkSW. 2) DSA has excellent scalability and achieves near linear speedup when increasing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Gene expression and cancer classification