Hide and seek: placing and finding an optimal tree for thousands of homoplasy-rich sequences
Dietrich Radel, Andreas Sand, Mike Steel

TL;DR
This paper demonstrates that a specific search algorithm can accurately identify the optimal evolutionary tree for thousands of taxa with high homoplasy, using a novel mathematical approach to generate test data with a known best tree.
Contribution
It introduces a method to generate homoplasy-rich sequence data with a known optimal tree, enabling rigorous testing of tree search algorithms like TNT.
Findings
TNT correctly finds the optimal tree for 32,768 taxa.
The dataset has high homoplasy with an average of 1148 changes per character.
The method provides a way to validate tree search algorithms on complex data.
Abstract
Finding optimal evolutionary trees from sequence data is typically an intractable problem, and there is usually no way of knowing how close to optimal the best tree from some search truly is. The problem would seem to be particularly acute when we have many taxa and when that data has high levels of homoplasy, in which the individual characters require many changes to fit on the best tree. However, a recent mathematical result has provided a precise tool to generate a short number of high-homoplasy characters for any given tree, so that this tree is provably the optimal tree under the maximum parsimony criterion. This provides, for the first time, a rigorous way to test tree search algorithms on homoplasy-rich data, where we know in advance what the `best' tree is. In this short note we consider just one search program (TNT) but show that it is able to locate the globally optimal tree…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · RNA and protein synthesis mechanisms
