The most parsimonious tree for random data
Mareike Fischer, Michelle Galla, Lina Herbst, Mike Steel

TL;DR
This paper investigates the bias of maximum parsimony in reconstructing phylogenetic trees from random data, revealing shape preferences and their persistence even as data size increases.
Contribution
It demonstrates that certain tree shapes are more likely to be MP trees from random data and analyzes how these biases behave with increasing data.
Findings
Caterpillar-shaped trees are more likely to be MP trees for small data sets.
Biases in tree shape preferences diminish as the number of characters increases.
Certain shape biases persist even with large data sets, contrary to initial expectations.
Abstract
Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree `shapes'. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of such characters, as we show. For , and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters grows. However, again there is a twist: MP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
