Optimal Detection of Sequence Similarity by Local Alignment
Terence Hwa (UC San Diego), Michael Lassig (MPI Teltow)

TL;DR
This paper provides a theoretical analysis of local alignment algorithms with gaps for DNA sequences, revealing scaling laws and criteria for optimal parameter selection to improve sequence similarity detection.
Contribution
It introduces a novel approach focusing on the score landscape and offers quantitative criteria for choosing optimal scoring parameters near the phase transition.
Findings
Alignment statistics differ near the phase transition with gaps versus gapless.
Optimal scores follow robust scaling laws for uncorrelated sequences.
Deviation from scaling laws indicates sequence homology.
Abstract
The statistical properties of local alignment algorithms with gaps are analyzed theoretically for uncorrelated and correlated DNA sequences. In the vicinity of the log-linear phase transition, the statistics of alignment with gaps is shown to be characteristically different from that of gapless alignment. The optimal scores obtained for uncorrelated sequences obey certain robust scaling laws. Deviation from these scaling laws signals sequence homology, and can be used to guide the empirical selection of scoring parameters for the optimal detection of sequence similarities. This can be accomplished in a computationally efficient way by using a novel approach focusing on the score landscape. Furthermore, by assuming a few gross features characterizing the statistics of underlying sequence-sequence correlations, quantitative criteria are obtained for the choice of optimal scoring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Algorithms and Data Compression · Gene expression and cancer classification
