Phase transition in the sample complexity of likelihood-based phylogeny   inference

Sebastien Roch; Allan Sly

arXiv:1508.01964·math.PR·July 20, 2017

Phase transition in the sample complexity of likelihood-based phylogeny inference

Sebastien Roch, Allan Sly

PDF

TL;DR

This paper establishes tight bounds on the data needed for maximum likelihood phylogeny inference, showing it is efficient and feasible under certain evolutionary models and conditions.

Contribution

It provides the first matching upper and lower bounds on sequence-length requirements for maximum likelihood phylogeny reconstruction, especially near the Kesten-Stigum threshold.

Findings

01

Sequence-length requirement is logarithmic in the number of tips below the Kesten-Stigum threshold.

02

Sequence-length requirement is polynomial in the number of tips in general.

03

Maximum likelihood can be computed efficiently on random data under certain conditions.

Abstract

Reconstructing evolutionary trees from molecular sequence data is a fundamental problem in computational biology. Stochastic models of sequence evolution are closely related to spin systems that have been extensively studied in statistical physics and that connection has led to important insights on the theoretical properties of phylogenetic reconstruction algorithms as well as the development of new inference methods. Here, we study maximum likelihood, a classical statistical technique which is perhaps the most widely used in phylogenetic practice because of its superior empirical accuracy. At the theoretical level, except for its consistency, that is, the guarantee of eventual correct reconstruction as the size of the input data grows, much remains to be understood about the statistical properties of maximum likelihood in this context. In particular, the best bounds on the sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.