On the inference of large phylogenies with long branches: How long is too long?
Elchanan Mossel, Sebastien Roch, Allan Sly

TL;DR
This paper investigates the sequence-length requirements for accurately reconstructing large phylogenies with long branches under GTR models, revealing a phase transition at the critical branch length and demonstrating the limits of current methods.
Contribution
It extends understanding of sequence-length thresholds for phylogeny reconstruction from CFN to GTR models, showing a gap between KS bound and the MLE threshold and providing a reconstruction algorithm for intermediate branch lengths.
Findings
Reconstruction is possible with O(log n) sequences below the KS bound.
A family of models where reconstruction is feasible between KS and MLE thresholds.
Polynomial sequence-length is necessary for branches longer than the MLE threshold.
Abstract
Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In [Daskalakis et al.'09], building on the work of [Mossel'04], a tight sequence-length requirement was obtained for the CFN model. In particular the required sequence length for high-probability reconstruction was shown to undergo a sharp transition (from to , where is the number of leaves) at the "critical" branch length (if it exists) of the ancestral reconstruction problem. Here we consider the GTR model. For this model, recent results of [Roch'09] show that the tree can be accurately reconstructed with sequences of length when the branch lengths are below , known as the Kesten-Stigum (KS) bound. Although for the CFN model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
