Pairwise sequence alignment at arbitrarily large evolutionary distance
Brandon Legried, Sebastien Roch

TL;DR
This paper demonstrates that perfect pairwise sequence alignment is theoretically achievable at arbitrarily large evolutionary distances if the phylogeny is known and sufficiently dense, using advanced probabilistic and ancestral reconstruction techniques.
Contribution
It establishes a formal connection between ancestral sequence reconstruction and multiple sequence alignment, showing perfect alignment is possible at large distances under certain conditions.
Findings
Perfect pairwise alignment possible at large evolutionary distances
Known dense phylogeny enables high-probability alignment
Uses ancestral reconstruction and probabilistic models with indels
Abstract
Ancestral sequence reconstruction is a key task in computational biology. It consists in inferring a molecular sequence at an ancestral species of a known phylogeny, given descendant sequences at the tip of the tree. In addition to its many biological applications, it has played a key role in elucidating the statistical performance of phylogeny estimation methods. Here we establish a formal connection to another important bioinformatics problem, multiple sequence alignment, where one attempts to best align a collection of molecular sequences under some mismatch penalty score by inserting gaps. Our result is counter-intuitive: we show that perfect pairwise sequence alignment with high probability is possible in principle at arbitrary large evolutionary distances - provided the phylogeny is known and dense enough. We use techniques from ancestral sequence reconstruction in the taxon-rich…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Gene expression and cancer classification
MethodsALIGN
