Hardness of Covering Alignment: Phase Transition in Post-Sequence Genomics
Romeo Rizzi, Massimo Cairo, Veli M\"akinen, Alexandru I. Tomescu,, Daniel Valenzuela

TL;DR
This paper demonstrates that the problem of covering alignment in labeled DAGs, relevant for pan-genome and diploid genome analysis, is NP-hard, revealing a phase transition in computational complexity for these genomics applications.
Contribution
It establishes the NP-hardness of covering alignment in labeled DAGs, extending sequence alignment complexity results to more general graph structures in genomics.
Findings
Covering alignment in labeled DAGs is NP-hard on binary alphabets.
Recombination-oblivious diploid alignment is NP-hard on alphabets of size 3.
Shows a phase transition in computational complexity for genome graph alignment problems.
Abstract
Covering alignment problems arise from recent developments in genomics; so called pan-genome graphs are replacing reference genomes, and advances in haplotyping enable full content of diploid genomes to be used as basis of sequence analysis. In this paper, we show that the computational complexity will change for natural extensions of alignments to pan-genome representations and to diploid genomes. More broadly, our approach can also be seen as a minimal extension of sequence alignment to labelled directed acyclic graphs (labeled DAGs). Namely, we show that finding a \emph{covering alignment} of two labeled DAGs is NP-hard even on binary alphabets. A covering alignment asks for two paths (red) and (green) in DAG and two paths (red) and (green) in DAG that cover the nodes of the graphs and maximize the sum of the global alignment scores:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
