Entropy and long-range correlations in DNA sequences
S.S. Melnik, O.V. Usatenko

TL;DR
This paper develops a theoretical framework using additive Markov chains to analyze the entropy and long-range correlations in DNA sequences, enabling better understanding of genetic information and potential biological classification.
Contribution
It introduces a novel Markov chain-based method to calculate DNA sequence entropy considering long-range correlations, surpassing standard approaches.
Findings
Analytical expression for entropy as a functional of pair correlator.
Ability to compute entropy of long subsequences efficiently.
Potential application in biological classification of species.
Abstract
We analyze the structure of DNA molecules of different organisms by using the additive Markov chain approach. Transforming nucleotide sequences into binary strings, we perform statistical analysis of the corresponding "texts". We develop the theory of N-step additive binary stationary ergodic Markov chains and analyze their differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain by means of the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses two point correlators instead of probability of block occurring, it makes possible to calculate the entropy of subsequences at much longer distances than with the use of the standard methods. We utilize the obtained analytical result for numerical evaluation of the entropy of coarse-grained DNA texts. We believe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
