Regular Hilberg Processes: An Example of Processes with a Vanishing Entropy Rate
{\L}ukasz D\k{e}bowski

TL;DR
This paper introduces a new class of regular Hilberg processes called RHA processes, which exhibit hyperlogarithmic maximal repetition growth and a vanishing entropy rate, providing insights into language modeling and entropy behavior.
Contribution
It constructs explicit examples of regular Hilberg processes (RHA processes) that challenge existing assumptions about entropy rates in natural language.
Findings
Expected code length vastly exceeds Shannon entropy for RHA processes
RHA processes demonstrate hyperlogarithmic maximal repetition growth
They provide a counterexample to universal redundancy rate results
Abstract
A regular Hilberg process is a stationary process that satisfies both a hyperlogarithmic growth of maximal repetition and a power-law growth of topological entropy, which are a kind of dual conditions. The hyperlogarithmic growth of maximal repetition has been experimentally observed for texts in natural language, whereas the power-law growth of topological entropy implies a vanishing Shannon entropy rate and thus probably does not hold for natural language. In this paper, we provide a constructive example of regular Hilberg processes, which we call random hierarchical association (RHA) processes. Our construction does not apply the standard cutting and stacking method. For the constructed RHA processes, we demonstrate that the expected length of any uniquely decodable code is orders of magnitude larger than the Shannon block entropy of the ergodic component of the RHA process. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
