Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences
Raphael Hauser, Heinrich Matzinger

TL;DR
This paper proves that for large random sequences, the distribution of aligned letter pairs in optimal alignments converges to a unique limit when the scoring function is random, offering insights into sequence alignment and relatedness testing.
Contribution
It establishes the almost sure convergence of the empirical distribution of aligned pairs in optimal alignments for random sequences with a random scoring function, linking to last passage percolation models.
Findings
Empirical distribution converges to a unique limit as sequence length increases.
Results provide a new perspective on the microscopic structure of optimal alignments.
Offers an alternative method for testing genetic sequence relatedness.
Abstract
Considering the optimal alignment of two i.i.d. random sequences of length , we show that when the scoring function is chosen randomly, almost surely the empirical distribution of aligned letter pairs in all optimal alignments converges to a unique limiting distribution as tends to infinity. This result is interesting because it helps understanding the microscopic path structure of a special type of last passage percolation problem with correlated weights, an area of long-standing open problems. Characterizing the microscopic path structure yields furthermore a robust alternative to optimal alignment scores for testing the relatedness of genetic sequences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and statistical mechanics · Bayesian Methods and Mixture Models · Algorithms and Data Compression
