The gene family-free median of three
Daniel Doerr, Pedro Feijao, Metin Balaban, Cedric Chauve

TL;DR
This paper introduces a novel gene family-free median approach for comparative genomics that constructs median genomes directly from sequence similarity graphs, bypassing the need for prior gene family assignment.
Contribution
It develops a new model for median genome construction in a family-free setting, including an exact solution method and validation through simulations and database comparisons.
Findings
Accurately computes medians and positional orthologs for bacterial-sized genomes.
The problem is MAX SNP-hard, but can be solved exactly with a 0-1 linear program.
The method outperforms existing approaches in accuracy for gene order analysis.
Abstract
The gene family-free framework for comparative genomics aims at developing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity multipartite graph. We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We show that the corresponding computational problem is MAX SNP-hard and we present a 0-1 linear program for its exact solution. The result of our FF-median program is a median genome with median genes associated to extant genes, in which median adjacencies are assumed to define positional orthologs. We demonstrate through simulations and comparison with the OMA orthology database that the herein presented method is able…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
