Efficient Parallel and Out of Core Algorithms for Constructing Large   Bi-directed de Bruijn Graphs

Vamsi Kundeti; Sanguthevar Rajasekaran; Hieu Dinh

arXiv:1003.1940·cs.DS·March 10, 2010

Efficient Parallel and Out of Core Algorithms for Constructing Large Bi-directed de Bruijn Graphs

Vamsi Kundeti, Sanguthevar Rajasekaran, Hieu Dinh

PDF

TL;DR

This paper introduces a scalable, parallel, and out-of-core algorithm for constructing large bi-directed de Bruijn graphs, significantly improving efficiency and reducing communication costs compared to previous methods, with proven scalability and optimal I/O complexity.

Contribution

The paper presents a novel $ heta(n/p)$ time parallel algorithm for bi-directed de Bruijn graph construction with low communication complexity and out-of-core adaptability, outperforming prior approaches.

Findings

01

The new algorithm is faster than previous methods.

02

It has optimal I/O complexity in out-of-core settings.

03

Demonstrated scalability on a SGI/Altix system.

Abstract

Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories -- based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In Jackson et. al. ICPP-2008, an $O (n / p)$ time parallel algorithm has been given for this problem. Here $n$ is the size of the input and $p$ is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.