A basic limitation on inferring phylogenies by pairwise sequence comparisons
Mike Steel

TL;DR
This paper demonstrates fundamental limitations in inferring phylogenetic trees using pairwise sequence comparisons, showing that different trees with unknown rate variation can produce indistinguishable pairwise distributions, challenging the reliability of such methods.
Contribution
The paper reveals that without prior knowledge of rate variation parameters, pairwise sequence comparisons cannot reliably distinguish between different phylogenetic trees, highlighting a key limitation.
Findings
Different trees can produce identical pairwise distributions without known rate parameters.
Identifiability can be restored under clocklike branch lengths or maximum likelihood methods.
The limitation applies broadly, not to specific or contrived cases.
Abstract
Distance-based approaches in phylogenetics such as Neighbor-Joining are a fast and popular approach for building trees. These methods take pairs of sequences from them construct a value that, in expectation, is additive under a stochastic model of site substitution. Most models assume a distribution of rates across sites, often based on a gamma distribution. Provided the (shape) parameter of this distribution is known, the method can correctly reconstruct the tree. However, if the shape parameter is not known then we show that topologically different trees, with different shape parameters and associated positive branch lengths, can lead to exactly matching distributions on pairwise site patterns between all pairs of taxa. Thus, one could not distinguish between the two trees using pairs of sequences without some prior knowledge of the shape parameter. More surprisingly, this can happen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Evolution and Paleontology Studies
