Phase transition in the computational complexity of the shortest common superstring and genome assembly
L. A. Fernandez, V. Martin-Mayor, D. Yllanes

TL;DR
This paper investigates the computational complexity of genome assembly, revealing a phase transition that explains why practical instances are often solvable efficiently despite the problem's NP-hardness.
Contribution
The study demonstrates a phase transition in the complexity of the shortest common superstring problem and introduces a Markov-chain Monte Carlo method that outperforms deterministic algorithms in hard cases.
Findings
Existence of a phase transition in problem complexity
Practical instances are typically in the 'easy' phase
Proposed MCMC method outperforms deterministic algorithms in hard regimes
Abstract
Genome assembly, the process of reconstructing a long genetic sequence by aligning and merging short fragments, or reads, is known to be NP-hard, either as a version of the shortest common superstring problem or in a Hamiltonian-cycle formulation. That is, the computing time is believed to grow exponentially with the the problem size in the worst case. Despite this fact, high-throughput technologies and modern algorithms currently allow bioinformaticians to handle datasets of billions of reads. Using methods from statistical mechanics, we address this conundrum by demonstrating the existence of a phase transition in the computational complexity of the problem and showing that practical instances always fall in the 'easy' phase (solvable by polynomial-time algorithms). In addition, we propose a Markov-chain Monte Carlo method that outperforms common deterministic algorithms in the hard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · RNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies
