Reconstructing Ultrametric Trees from Noisy Experiments
Eshwar Ram Arunachaleswaran, Anindya De, Sampath Kannan

TL;DR
This paper develops efficient algorithms for reconstructing ultrametric evolutionary trees from noisy experimental data, establishing conditions under which topology and edge weights can be accurately recovered despite stochastic noise.
Contribution
It introduces a new noise model and provides algorithms with provable guarantees for reconstructing tree topology and weights when edges are sufficiently long.
Findings
Reconstruction is feasible when edges are at least O(1/\u221A n)) in length.
Topology reconstruction becomes impossible if edges are shorter than this threshold.
Edge weights can be approximately reconstructed under the same conditions for a specific noise model.
Abstract
The problem of reconstructing evolutionary trees or phylogenies is of great interest in computational biology. A popular model for this problem assumes that we are given the set of leaves (current species) of an unknown binary tree and the results of `experiments' on triples of leaves (a,b,c), which return the pair with the deepest least common ancestor. If the tree is assumed to be an ultrametric (i.e., all root-leaf paths have the same length), the experiment can be equivalently seen to return the closest pair of leaves. In this model, efficient algorithms are known for tree reconstruction. In reality, since the data on which these `experiments' are run is itself generated by the stochastic process of evolution, these experiments are noisy. In all reasonable models of evolution, if the branches leading to the leaves in a triple separate from each other at common ancestors that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Single-cell and spatial transcriptomics
