Efficient Parallel and Out of Core Algorithms for Constructing Large Bi-directed de Bruijn Graphs
Vamsi Kundeti, Sanguthevar Rajasekaran, Hieu Dinh

TL;DR
This paper introduces a scalable, parallel, and out-of-core algorithm for constructing large bi-directed de Bruijn graphs, significantly improving efficiency and reducing communication costs compared to previous methods, with proven scalability and optimal I/O complexity.
Contribution
The paper presents a novel $ heta(n/p)$ time parallel algorithm for bi-directed de Bruijn graph construction with low communication complexity and out-of-core adaptability, outperforming prior approaches.
Findings
The new algorithm is faster than previous methods.
It has optimal I/O complexity in out-of-core settings.
Demonstrated scalability on a SGI/Altix system.
Abstract
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories -- based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In Jackson et. al. ICPP-2008, an time parallel algorithm has been given for this problem. Here is the size of the input and is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
