A Lower Bound on the Complexity of Approximating the Entropy of a Markov Source
Travis Gagie

TL;DR
This paper establishes a fundamental lower bound on the number of samples needed to approximate the entropy of a Markov source, showing that certain entropy levels are inherently hard to distinguish with limited data.
Contribution
It proves a lower bound on the sample complexity for entropy approximation of Markov sources, highlighting fundamental limitations in the field.
Findings
No algorithm can reliably distinguish entropy levels with fewer than (\sigma - k)^{k/2 - ext{epsilon}} samples.
The lower bound applies even when the entropy is either 0 or at least \log (\sigma - k).
Sample complexity grows exponentially with the order of the Markov source.
Abstract
Suppose that, for any (k \geq 1), (\epsilon > 0) and sufficiently large , we are given a black box that allows us to sample characters from a th-order Markov source over the alphabet (\{0, ..., \sigma - 1\}). Even if we know the source has entropy either 0 or at least (\log (\sigma - k)), there is still no algorithm that, with probability bounded away from (1 / 2), guesses the entropy correctly after sampling at most ((\sigma - k)^{k / 2 - \epsilon}) characters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
