Identifying statistical dependence in genomic sequences via mutual information estimates
H.M. Aktulga, I. Kontoyiannis, L.A. Lyznik, L. Szpankowski, A.Y. Grama, and W. Szpankowski

TL;DR
This paper introduces an information-theoretic method based on mutual information to identify and quantify statistical dependencies in genomic sequences, aiding in understanding genetic structures and variations.
Contribution
The paper presents a novel, precise methodology using mutual information for detecting dependencies in DNA and RNA sequences, with applications in gene analysis and genetic profiling.
Findings
Detected significant dependencies in maize gene regions.
Effectively identified short tandem repeats in DNA data.
Method proved reliable for biological sequence analysis.
Abstract
Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Fractal and DNA sequence analysis · Machine Learning in Bioinformatics
