Implementation Of Dynamic De Bruijn Graphs Via Learned Index

Riccardo Nigrelli

arXiv:2406.12339·cs.DS·June 19, 2024

Implementation Of Dynamic De Bruijn Graphs Via Learned Index

Riccardo Nigrelli

PDF

Open Access

TL;DR

This paper introduces a novel approach to implementing dynamic De Bruijn graphs using learned indexes, significantly improving insertion efficiency and memory usage for large-scale sequencing data.

Contribution

It presents a new method that leverages learned indexes for dynamic De Bruijn graphs, outperforming existing implementations in speed and memory efficiency.

Findings

01

Improved insertion time for large datasets

02

Reduced memory footprint compared to traditional methods

03

Effective handling of over 110 million k-mers

Abstract

De Bruijn graphs are essential for sequencing data analysis and must be efficiently constructed and stored for large-scale population studies. They also need to be dynamic to allow updates such as adding or removing edges and nodes. Existing dynamic implementations include DynamicBOSS and dynamicDBG. In 2018, a new family of data structures called learned indexes was introduced by Tim Kraska and Alex Beutel, with a particularly efficient implementation proposed by Paolo Ferragina and Giorgio Vinciguerra in 2020. This paper presents a new method for implementing De Bruijn graphs using learned indexes and compares its performance with current implementations. The new method shows improved time and memory efficiency for edge and node insertions, particularly with large datasets (over 110 million k-mers).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications · Image Retrieval and Classification Techniques