Phylogenetic mixtures: Concentration of measure in the large-tree limit

Elchanan Mossel; Sebastien Roch

arXiv:1108.3112·math.PR·November 30, 2012

Phylogenetic mixtures: Concentration of measure in the large-tree limit

Elchanan Mossel, Sebastien Roch

PDF

TL;DR

This paper demonstrates that mixtures of large phylogenetic trees are generally identifiable using concentration of measure techniques and establishes sequence-length requirements for reliable reconstruction in computational evolutionary biology.

Contribution

It introduces a novel application of concentration of measure to prove the identifiability of large-tree phylogenetic mixtures and derives sequence-length bounds for accurate reconstruction.

Findings

01

Mixtures of large trees are typically identifiable.

02

Sequence-length requirements for high-probability reconstruction are established.

03

Concentration of measure techniques are effectively applied to phylogenetic mixture models.

Abstract

The reconstruction of phylogenies from DNA or protein sequences is a major task of computational evolutionary biology. Common phenomena, notably variations in mutation rates across genomes and incongruences between gene lineage histories, often make it necessary to model molecular data as originating from a mixture of phylogenies. Such mixed models play an increasingly important role in practice. Using concentration of measure techniques, we show that mixtures of large trees are typically identifiable. We also derive sequence-length requirements for high-probability reconstruction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.