Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance
Ruchi Chaudhary, J. Gordon Burleigh, David Fern\'andez-Baca

TL;DR
This paper introduces a novel method for inferring species trees from multi-copy gene trees by generalizing the Robinson-Foulds distance, accommodating incongruences from various biological processes without assuming a specific cause.
Contribution
It develops a new generalized RF distance for mul-trees and formulates the MulRF supertree problem, providing a fast heuristic algorithm that improves accuracy over existing methods.
Findings
MulRF outperforms gene tree parsimony methods in accuracy.
The heuristic algorithm is efficient for large datasets.
The method handles incongruence from multiple biological processes.
Abstract
We present a new method for inferring species trees from multi-copy gene trees. Our method is based on a generalization of the Robinson-Foulds (RF) distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple leaves can have the same label. Unlike most previous phylogenetic methods using gene trees, this method does not assume that gene tree incongruence is caused by a single, specific biological process, such as gene duplication and loss, deep coalescence, or lateral gene transfer. We prove that it is NP-hard to compute the RF distance between two mul-trees, but it is easy to calculate the generalized RF distance between a mul-tree and a singly-labeled tree. Motivated by this observation, we formulate the RF supertree problem for mul-trees (MulRF), which takes a collection of mul-trees and constructs a species tree that minimizes the total RF distance from the input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Bioinformatics and Genomic Networks · Genetic diversity and population structure
