Information content versus word length in random typing
Ramon Ferrer-i-Cancho, Ferm\'in Moscoso del Prado Mart\'in

TL;DR
This paper investigates the relationship between information content and word length in random typing, showing a linear correlation that does not necessarily imply linguistic optimization but can arise from random processes.
Contribution
It demonstrates that a linear relationship between information content and word length can occur in random typing models, challenging the idea that such linearity indicates linguistic optimization.
Findings
Linear relationship observed in random typing models.
Exact slope and intercept for three variants provided.
Correlation can arise from units like letters, not optimization.
Abstract
Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
