Distribution of Aligned Letter Pairs in Optimal Alignments of Random   Sequences

Raphael Hauser; Heinrich Matzinger

arXiv:1211.5491·math.PR·November 26, 2012·1 cites

Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences

Raphael Hauser, Heinrich Matzinger

PDF

Open Access

TL;DR

This paper proves that for large random sequences, the distribution of aligned letter pairs in optimal alignments converges to a unique limit when the scoring function is random, offering insights into sequence alignment and relatedness testing.

Contribution

It establishes the almost sure convergence of the empirical distribution of aligned pairs in optimal alignments for random sequences with a random scoring function, linking to last passage percolation models.

Findings

01

Empirical distribution converges to a unique limit as sequence length increases.

02

Results provide a new perspective on the microscopic structure of optimal alignments.

03

Offers an alternative method for testing genetic sequence relatedness.

Abstract

Considering the optimal alignment of two i.i.d. random sequences of length $n$ , we show that when the scoring function is chosen randomly, almost surely the empirical distribution of aligned letter pairs in all optimal alignments converges to a unique limiting distribution as $n$ tends to infinity. This result is interesting because it helps understanding the microscopic path structure of a special type of last passage percolation problem with correlated weights, an area of long-standing open problems. Characterizing the microscopic path structure yields furthermore a robust alternative to optimal alignment scores for testing the relatedness of genetic sequences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and statistical mechanics · Bayesian Methods and Mixture Models · Algorithms and Data Compression