Sequence alignment and mutual information
Orion Penner, Peter Grassberger, and Maya Paczuski

TL;DR
This paper introduces a robust method to estimate mutual information from global sequence alignments, potentially providing an objective measure for sequence similarity and alignment quality in computational bioscience.
Contribution
The authors present a simple, flexible approach to estimate mutual information from global alignments, aligning well with estimates from unrelated methods and enabling objective evaluation of alignment quality.
Findings
Mutual information estimates from alignments closely match those from sequence concatenation and zipping.
The approach provides a consistent, model-independent measure of sequence similarity.
Potential applications include assessing alignment quality and significance objectively.
Abstract
Background: Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. All existing alignment algorithms rely on heuristic scoring schemes based on biological expertise. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure -- the mutual information (MI) -- previous attempts to connect sequence alignment and information theory have not produced realistic estimates for the MI from a given alignment. Results: Here we describe a simple and flexible approach to get robust estimates of MI from {\it global} alignments. For mammalian mitochondrial DNA, our approach gives pairwise MI estimates for commonly used global alignment algorithms that are strikingly close to estimates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
