Species Trees are Recoverable from Unrooted Gene Tree Topologies Under a Constant Rate of Horizontal Gene Transfer
Constantinos Daskalakis, Sebastien Roch

TL;DR
This paper demonstrates that species trees can be accurately reconstructed from unrooted gene tree topologies even with high rates of horizontal gene transfer, closing previous theoretical gaps.
Contribution
It introduces a new polynomial-time algorithm for reconstructing species trees from unrooted gene trees under high HGT rates, and provides matching bounds for the limits of this reconstruction.
Findings
Species trees are recoverable with constant HGT rates per gene.
The proposed algorithm works with unrooted gene trees and is computationally efficient.
Theoretical bounds on the limits of reconstruction are tight up to a constant.
Abstract
Reconstructing the tree of life from molecular sequences is a fundamental problem in computational biology. Modern data sets often contain a large number of genes, which can complicate the reconstruction problem due to the fact that different genes may undergo different evolutionary histories. This is the case in particular in the presence of horizontal genetic transfer (HGT), where a gene is inherited from a distant species rather than an immediate ancestor. Such an event produces a gene tree which is distinct from, but related to, the species phylogeny. In previous work, a natural stochastic models of HGT was introduced and studied. It was shown, both in simulation and theoretical studies, that a species phylogeny can be reconstructed from gene trees despite surprisingly high rates of HGT under this model. Rigorous lower and upper bounds on this achievable rate were also obtained,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Gene expression and cancer classification · Genome Rearrangement Algorithms
