Fast reconstruction of phylogenetic trees using locality-sensitive hashing
Daniel G. Brown, Jakub Truszkowski

TL;DR
This paper introduces a novel sub-quadratic time algorithm for reconstructing phylogenetic trees from short sequences, leveraging locality-sensitive hashing to achieve high probability accuracy with theoretical guarantees.
Contribution
The authors develop the first sub-quadratic algorithm with provable correctness for phylogenetic tree reconstruction from short sequences, improving speed while maintaining accuracy.
Findings
Algorithm runs in near-linear time for very short branches
High accuracy in large-scale phylogeny reconstruction
Effective for sequences with mutation probabilities below 0.02
Abstract
We present the first sub-quadratic time algorithm that with high probability correctly reconstructs phylogenetic trees for short sequences generated by a Markov model of evolution. Due to rapid expansion in sequence databases, such very fast algorithms are becoming necessary. Other fast heuristics have been developed for building trees from very large alignments (Price et al, and Brown et al), but they lack theoretical performance guarantees. Our new algorithm runs in time, where is an increasing function of an upper bound on the branch lengths in the phylogeny, the upper bound must be below, and for all . For phylogenies with very short branches, the running time of our algorithm is close to linear. For example, if all branch lengths correspond to a mutation probability of less than 0.02, the running…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Genetic diversity and population structure
