On the Approximability of Comparing Genomes with Duplicates
S\'ebastien Angibaud (LINA), Guillaume Fertin (LINA), Irena Rusu, (LINA), Annelyse Thevenin (LRI), St\'ephane Vialette (IGM)

TL;DR
This paper investigates the computational complexity of comparing duplicated genomes using various matching models and measures, establishing hardness results and providing algorithms for specific cases in comparative genomics.
Contribution
It proves APX-hardness for optimizing similarity measures with genome matchings and offers polynomial-time solutions and approximation algorithms for certain models and measures.
Findings
Computing optimal matchings for similarity measures is APX-hard.
NP-Complete for exemplar model with no breakpoints.
Polynomial-time algorithms for maximum matching model.
Abstract
A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogeny. All the existing measures are defined on genomes without duplicates. However, we know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar, intermediate and maximum matching models). We prove that, for each model and each measure M, computing a matching between two genomes that optimizes M is APX-hard. We also study the complexity of the following…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
