Impossibility of consistent distance estimation from sequence lengths under the TKF91 model
Wai-Tong Louis Fan, Brandon Legried, Sebastien Roch

TL;DR
This paper proves that under the TKF91 model, it is impossible to reliably estimate evolutionary distances solely from sequence lengths as sequences grow infinitely long, due to indistinguishable length distributions.
Contribution
It demonstrates the fundamental limitation of using sequence lengths alone for accurate distance estimation under the TKF91 model.
Findings
Sequence lengths become indistinguishable at different distances as length tends to infinity.
No consistent distance estimator exists based solely on sequence lengths under the model.
The result highlights a fundamental limitation in phylogenetic inference methods.
Abstract
We consider the problem of distance estimation under the TKF91 model of sequence evolution by insertions, deletions and substitutions on a phylogeny. In an asymptotic regime where the expected sequence lengths tend to infinity, we show that no consistent distance estimation is possible from sequence lengths alone. More formally, we establish that the distributions of pairs of sequence lengths at different distances cannot be distinguished with probability going to one.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
