When is String Reconstruction using de Bruijn Graphs Hard?

Ben Bals; Sebastiaan van Krieken; Solon P. Pissis; Leen Stougie; Hilde Verbeek

arXiv:2508.03433·cs.DS·October 17, 2025

When is String Reconstruction using de Bruijn Graphs Hard?

Ben Bals, Sebastiaan van Krieken, Solon P. Pissis, Leen Stougie, Hilde Verbeek

PDF

TL;DR

This paper investigates the computational complexity of reconstructing strings from de Bruijn graphs with domain knowledge constraints, providing new algorithms that improve over previous exponential-time solutions, especially when position ranges are small.

Contribution

The paper introduces improved algorithms for string reconstruction on de Bruijn graphs, with complexity depending on the size of position intervals relative to k, advancing the understanding of problem hardness.

Findings

01

The problem is NP-complete in general.

02

New algorithms outperform previous exponential-time methods.

03

Efficiency improves when position ranges are small compared to k.

Abstract

The reduction of the fragment assembly problem to (variations of) the classical Eulerian trail problem [Pevzner et al., PNAS 2001] has led to remarkable progress in genome assembly. This reduction employs the notion of de Bruijn graph $G = (V, E)$ of order $k$ over an alphabet $Σ$ . A single Eulerian trail in $G$ represents a candidate genome reconstruction. Bernardini et al. have also introduced the complementary idea in data privacy [ALENEX 2020] based on $z$ -anonymity. The pressing question is: How hard is it to reconstruct a best string from a de Bruijn graph given a function that models domain knowledge? Such a function maps every length- $k$ string to an interval of positions where it may occur in the reconstructed string. By the above reduction to de Bruijn graphs, the latter function translates into a function $c$ mapping every edge to an interval where it may occur in an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.