Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method
Gautam Dasarathy, Robert Nowak, and Sebastien Roch

TL;DR
This paper introduces a new distance-based method for phylogenetic inference from multiple loci, accounting for gene tree estimation errors, and demonstrates its improved performance over existing methods.
Contribution
It provides the first comprehensive data requirement analysis for species tree reconstruction considering gene estimation errors and proposes a novel, more accurate reconstruction algorithm.
Findings
The new method outperforms previous approaches in key regimes.
It quantifies data requirements for accurate species tree inference.
The algorithm effectively handles gene tree estimation errors.
Abstract
We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full data-requirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
