Statistically consistent and computationally efficient inference of ancestral DNA sequences in the TKF91 model under dense taxon sampling
Wai-Tong Louis Fan, Sebastien Roch

TL;DR
This paper introduces a statistically consistent and computationally efficient algorithm for reconstructing ancestral DNA sequences in dense phylogenies under the TKF91 model, accounting for substitutions, insertions, and deletions.
Contribution
It presents the first polynomial-time ancestral reconstruction algorithm with provable guarantees under the TKF91 model for dense phylogenies with bounded height.
Findings
Algorithm is statistically consistent under the big bang condition.
Achieves polynomial-time complexity for ancestral sequence inference.
Works under constant mutation rates in dense phylogenies.
Abstract
In evolutionary biology, the speciation history of living organisms is represented graphically by a phylogeny, that is, a rooted tree whose leaves correspond to current species and branchings indicate past speciation events. Phylogenies are commonly estimated from molecular sequences, such as DNA sequences, collected from the species of interest. At a high level, the idea behind this inference is simple: the further apart in the Tree of Life are two species, the greater is the number of mutations to have accumulated in their genomes since their most recent common ancestor. In order to obtain accurate estimates in phylogenetic analyses, it is standard practice to employ statistical approaches based on stochastic models of sequence evolution on a tree. For tractability, such models necessarily make simplifying assumptions about the evolutionary mechanisms involved. In particular, commonly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
