Statistics for Phylogenetic Trees in the Presence of Stickiness
Lars Lammers, Tom M. W. Nye, Stephan F. Huckemann

TL;DR
This paper investigates the statistical properties and computational challenges of the Fréchet mean in phylogenetic tree space, especially under the influence of stickiness, and introduces new methods and hypothesis tests for analyzing such trees.
Contribution
It presents novel methods for identifying edges in the Fréchet mean, explores the impact of stickiness on asymptotics, and develops hypothesis tests for tree data with complex topologies.
Findings
New algorithms for edge identification in Fréchet means
Demonstration of stickiness effects in biological and medical data
Hypothesis tests robust to the presence of stickiness
Abstract
Samples of phylogenetic trees arise in a variety of evolutionary and biomedical applications, and the Fr\'echet mean in Billera-Holmes-Vogtmann tree space is a summary tree shown to have advantages over other mean or consensus trees. However, use of the Fr\'echet mean raises computational and statistical issues which we explore in this paper. The Fr\'echet sample mean is known often to contain fewer internal edges than the trees in the sample, and in this circumstance calculating the mean by iterative schemes can be problematic due to slow convergence. We present new methods for identifying edges which must lie in the Fr\'echet sample mean and apply these to a data set of gene trees relating organisms from the apicomplexa which cause a variety of parasitic infections. When a sample of trees contains a significant level of heterogeneity in the branching patterns, or topologies, displayed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Evolution and Paleontology Studies · Fractal and DNA sequence analysis
MethodsSparse Evolutionary Training
