Accurate reconstruction of insertion-deletion histories by statistical phylogenetics
Oscar Westesson, Gerton Lunter, Benedict Paten, Ian Holmes

TL;DR
This paper introduces a new statistical method for accurately reconstructing insertion-deletion evolutionary histories using automata theory and benchmarks it against existing methods, demonstrating reduced bias and potential for alignment-free analysis.
Contribution
The paper presents a novel automata-based statistical approach for indel history reconstruction, improving accuracy and enabling alignment-free inference.
Findings
The new algorithm produces less biased indel histories on mammalian data.
It outperforms existing MSA methods in reconstructing evolutionary histories.
The method is applicable to alignment-free phylogenetic inference.
Abstract
The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
