Optimizing Smith-Waterman alignments
Rolf Olsen, Terence Hwa, and Michael Lassig

TL;DR
This paper introduces a statistical framework for optimizing Smith-Waterman local sequence alignments, including a new fidelity measure that assesses alignment significance and can be optimized using minimal data.
Contribution
It presents a novel statistical analysis and a fidelity measure for local alignments, enabling parameter optimization based on single sequence pair data.
Findings
New fidelity measure effectively captures alignment significance
Optimization of penalty parameters improves alignment accuracy
The approach is validated through theoretical analysis and simulations
Abstract
Mutual correlation between segments of DNA or protein sequences can be detected by Smith-Waterman local alignments. We present a statistical analysis of alignment of such sequences, based on a recent scaling theory. A new fidelity measure is introduced and shown to capture the significance of the local alignment, i.e., the extent to which the correlated subsequences are correctly identified. It is demonstrated how the fidelity may be optimized in the space of penalty parameters using only the alignment score data of a single sequence pair.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Stochastic processes and statistical mechanics
