Entropy and long-range correlations in random symbolic sequences
S.S. Melnik, O.V. Usatenko

TL;DR
This paper develops a method to estimate the entropy of long-range correlated symbolic sequences, like English text and DNA, using Markov chains and correlation functions, revealing the contributions of correlations and fluctuations.
Contribution
It introduces an analytical approach to estimate entropy in long-range correlated sequences using Markov models and correlation functions, applicable to complex systems.
Findings
Entropy is influenced by correlation and fluctuation contributions.
The method effectively estimates entropy of English text and DNA sequences.
Analytical results align with numerical evaluations.
Abstract
The goal of this paper is to develop an estimate for the entropy of random long-range correlated symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain. Supposing that the correlations between random elements of the chain are weak we express the differential entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the differential entropy of finite symbolic sequences. We show that the entropy contains two contributions, the correlation and fluctuation ones. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short- and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
