Phylogenomics with Paralogs
Marc Hellmuth, Nicolas Wieseke, Marcus Lechner, Hans-Peter Lenhof,, Martin Middendorf, Peter F. Stadler

TL;DR
This paper demonstrates that gene paralogs, traditionally viewed as noise, can be used as a valuable source of phylogenetic information to infer species trees, even with horizontal gene transfer.
Contribution
It introduces a method to utilize paralogs in phylogenomics, relaxing the need for exclusively orthologous data sets and improving phylogenetic inference.
Findings
Paralogs contain sufficient phylogenetic signal for species tree inference.
Genome-wide data with paralogs can produce fully resolved phylogenies.
The method is robust against horizontal gene transfer.
Abstract
Phylogenomics heavily relies on well-curated sequence data sets that consist, for each gene, exclusively of 1:1-orthologous. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise in order to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. While the resolution is very poor for individual gene families, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
