Constant conditional entropy and related hypotheses
Ramon Ferrer-i-Cancho, {\L}ukasz D\k{e}bowski, Ferm\'in Moscoso del, Prado Mart\'in

TL;DR
This paper revises the predictions of constant entropy rate and uniform information density principles in language, showing they are inconsistent with empirical laws like Hilberg's law, and concludes these hypotheses are incomplete.
Contribution
The paper critically evaluates and revises the predictions of CER and UID principles in light of Hilberg's law, revealing their limitations and proposing they are incomplete.
Findings
CER and strong UID imply uncorrelated, unrealistic sequences
Full UID leads to costly uncorrelated sequences
CER and its variants are incomplete hypotheses
Abstract
Constant entropy rate (conditional entropies must remain constant as the sequence length increases) and uniform information density (conditional probabilities must remain constant as the sequence length increases) are two information theoretic principles that are argued to underlie a wide range of linguistic phenomena. Here we revise the predictions of these principles to the light of Hilberg's law on the scaling of conditional entropy in language and related laws. We show that constant entropy rate (CER) and two interpretations for uniform information density (UID), full UID and strong UID, are inconsistent with these laws. Strong UID implies CER but the reverse is not true. Full UID, a particular case of UID, leads to costly uncorrelated sequences that are totally unrealistic. We conclude that CER and its particular cases are incomplete hypotheses about the scaling of conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
