The Tandem Duplication Distance is NP-hard

Manuel Lafond; Binhai Zhu; Peng Zou

arXiv:1906.05266·cs.DS·June 13, 2019

The Tandem Duplication Distance is NP-hard

Manuel Lafond, Binhai Zhu, Peng Zou

PDF

TL;DR

This paper proves that computing the tandem duplication distance between two strings is NP-hard, even with all characters distinct, and explores the problem's complexity and fixed-parameter tractability in bioinformatics.

Contribution

It establishes the NP-hardness of the tandem duplication distance problem and introduces the Cost-Effective Subgraph problem with W[1]-hardness results, advancing understanding of computational complexity in bioinformatics.

Findings

01

Proved tandem duplication distance is NP-hard.

02

Demonstrated NP-hardness even for exemplar cases with distinct characters.

03

Showed fixed-parameter tractability for the exemplar TD distance.

Abstract

In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation $A X B \Rightarrow A X X B$ . For example, Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings $S$ and $T$ over the same alphabet, compute the smallest sequence of tandem duplications required to convert $S$ to $T$ . The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received…

Figures6

Click any figure to enlarge with its caption.

Figure 3

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.