Assembling Omnitigs using Hidden-Order de Bruijn Graphs

Diego D\'iaz-Dom\'inguez; Djamal Belazzougui; Travis Gagie; Veli; M\"akinen; Gonzalo Navarro; Simon J. Puglisi

arXiv:1805.05228·cs.DS·May 15, 2018·1 cites

Assembling Omnitigs using Hidden-Order de Bruijn Graphs

Diego D\'iaz-Dom\'inguez, Djamal Belazzougui, Travis Gagie, Veli, M\"akinen, Gonzalo Navarro, Simon J. Puglisi

PDF

Open Access

TL;DR

This paper introduces a space-efficient method for navigating variable-order de Bruijn graphs to extract safe strings, improving DNA assembly by capturing more informative sequences than traditional unitigs.

Contribution

It replaces the LCP array with a succinct Cartesian tree representation, enabling efficient navigation and extraction of safe strings in variable-order de Bruijn graphs.

Findings

01

Extracted more informative strings than unitigs

02

Used only 2 extra bits per edge for navigation

03

Achieved efficient safe string extraction in experiments

Abstract

De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well for assembling all of the reads. For this reason, some de Bruijn-based assemblers try assembling on several graphs of increasing order, in turn. Boucher et al. (2015) went further and gave a representation making it possible to navigate in the graph and change order on the fly, up to a maximum $K$ , but they can use up to $l g K$ extra bits per edge because they use an LCP array. In this paper, we replace the LCP array by a succinct representation of that array's Cartesian tree, which takes only 2 extra bits per edge and still lets us support interesting navigation operations efficiently. These operations are not enough to let us easily extract unitigs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Genome Rearrangement Algorithms