Analyzing large-scale DNA Sequences on Multi-core Architectures

Suejb Memeti; Sabri Pllana

arXiv:1509.01506·cs.DC·September 7, 2015

Analyzing large-scale DNA Sequences on Multi-core Architectures

Suejb Memeti, Sabri Pllana

PDF

TL;DR

This paper introduces a scalable parallel method based on Finite Automata for analyzing large DNA sequences on multi-core systems, achieving significant speed-ups over existing approaches.

Contribution

The paper presents a novel parallel DNA analysis approach using Finite Automata, optimized for large datasets and multi-core architectures, outperforming previous pattern-based methods.

Findings

01

Achieved up to 17.6x speed-up on 24 cores.

02

Handled DNA segments up to 3.2GB in size.

03

Outperformed RE2-based pattern matching by up to 3x.

Abstract

Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analyzing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 17.6x. Our approach is up to 3x faster than a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.