Entropy estimation of symbol sequences
Thomas Sch\"urmann, Peter Grassberger

TL;DR
This paper explores algorithms that estimate the Shannon entropy of symbol sequences with long-range correlations, focusing on their convergence properties and applying a scaling law to various complex data sources.
Contribution
It introduces a scaling law for entropy estimation convergence and applies it to chaotic systems, cellular automata, and written language data.
Findings
Scaling law effectively extrapolates entropy from finite samples
Algorithms provide consistent entropy estimates for complex sequences
Application to diverse data sources demonstrates broad utility
Abstract
We discuss algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations. In particular, we consider algorithms which estimate h from the code lengths produced by some compression algorithm. Our interest is in describing their convergence with sequence length, assuming no limits for the space and time complexities of the compression algorithms. A scaling law is proposed for extrapolation from finite sample lengths. This is applied to sequences of dynamical systems in non-trivial chaotic regimes, a 1-D cellular automaton, and to written English texts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
