Long-branch attraction in species tree estimation: inconsistency of partitioned likelihood and topology-based summary methods
Sebastien Roch, Michael Nute, Tandy Warnow

TL;DR
This paper demonstrates that common species tree estimation methods become inconsistent under realistic conditions with limited sequence data, due to biases like long branch attraction, challenging current assumptions of statistical consistency.
Contribution
It reveals that traditional and summary methods for species tree estimation are inconsistent when the sequence length per locus is bounded, highlighting a fundamental challenge in phylogenetics.
Findings
Traditional fully partitioned maximum likelihood methods are inconsistent with bounded sequence length.
Summary methods combining gene trees also fail to be consistent under these conditions.
Long branch attraction significantly biases species tree estimation with limited data.
Abstract
With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus datasets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.e., the methods are said to be "statistically consistent"). Yet, little is known about the biologically realistic condition where the number of sites per locus is bounded. We show that when the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
