The Complexity of Approximate Pattern Matching on De Bruijn Graphs
Daniel Gibney, Sharma V. Thankachan, and Srinivas Aluru

TL;DR
This paper proves that approximate pattern matching on de Bruijn graphs with substitutions is NP-complete and unlikely to be solved faster than quadratic time, contrasting with the efficient exact matching capabilities of these graphs.
Contribution
It establishes the NP-completeness of approximate pattern matching on de Bruijn graphs with substitutions, showing the problem's computational hardness.
Findings
Determined approximate matching is NP-complete on de Bruijn graphs.
Fewer than quadratic algorithms are unlikely for this problem.
Exact matching on de Bruijn graphs can be done in linear time.
Abstract
Aligning a sequence to a walk in a labeled graph is a problem of fundamental importance to Computational Biology. For finding a walk in an arbitrary graph with edges that exactly matches a pattern of length , a lower bound based on the Strong Exponential Time Hypothesis (SETH) implies an algorithm significantly faster than time is unlikely [Equi et al., ICALP 2019]. However, for many special graphs, such as de Bruijn graphs, the problem can be solved in linear time [Bowe et al., WABI 2012]. For approximate matching, the picture is more complex. When edits (substitutions, insertions, and deletions) are only allowed to the pattern, or when the graph is acyclic, the problem is again solvable in time. When edits are allowed to arbitrary cyclic graphs, the problem becomes NP-complete, even on binary alphabets [Jain et al., RECOMB 2019]. These results hold even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Genomics and Phylogenetic Studies
