
TL;DR
This paper proves that the Maximal Strip Recovery problem in comparative genomics is APX-hard for all cases with multiple genomes, establishing fundamental limits on approximation and providing new bounds and algorithms.
Contribution
It establishes APX-hardness and NP-hardness bounds for MSR, extending inapproximability results to related genomic problems, and improves approximation algorithms for multiple genomes.
Findings
MSR is APX-hard for all d ≥ 2.
MSR cannot be approximated within Ω(d/log d) unless P=NP.
A polynomial-time approximation algorithm exists for non-constant d.
Abstract
In comparative genomic, the first step of sequence analysis is usually to decompose two or more genomes into syntenic blocks that are segments of homologous chromosomes. For the reliable recovery of syntenic blocks, noise and ambiguities in the genomic maps need to be removed first. Maximal Strip Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff for reliably recovering syntenic blocks from genomic maps in the midst of noise and ambiguities. Given genomic maps as sequences of gene markers, the objective of \msr{d} is to find subsequences, one subsequence of each genomic map, such that the total length of syntenic blocks in these subsequences is maximized. For any constant , a polynomial-time 2d-approximation for \msr{d} was previously known. In this paper, we show that for any , \msr{d} is APX-hard, even for the most basic version of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
