Assembling Omnitigs using Hidden-Order de Bruijn Graphs
Diego D\'iaz-Dom\'inguez, Djamal Belazzougui, Travis Gagie, Veli, M\"akinen, Gonzalo Navarro, Simon J. Puglisi

TL;DR
This paper introduces a space-efficient method for navigating variable-order de Bruijn graphs to extract safe strings, improving DNA assembly by capturing more informative sequences than traditional unitigs.
Contribution
It replaces the LCP array with a succinct Cartesian tree representation, enabling efficient navigation and extraction of safe strings in variable-order de Bruijn graphs.
Findings
Extracted more informative strings than unitigs
Used only 2 extra bits per edge for navigation
Achieved efficient safe string extraction in experiments
Abstract
De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well for assembling all of the reads. For this reason, some de Bruijn-based assemblers try assembling on several graphs of increasing order, in turn. Boucher et al. (2015) went further and gave a representation making it possible to navigate in the graph and change order on the fly, up to a maximum , but they can use up to extra bits per edge because they use an LCP array. In this paper, we replace the LCP array by a succinct representation of that array's Cartesian tree, which takes only 2 extra bits per edge and still lets us support interesting navigation operations efficiently. These operations are not enough to let us easily extract unitigs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Genome Rearrangement Algorithms
