The predictability of letters in written english
Thomas Sch\"urmann, Peter Grassberger

TL;DR
This paper investigates how the predictability of letters in written English varies with their position within words, revealing that internal letters are significantly more predictable than initial letters, reflecting the subunit structure of words.
Contribution
It demonstrates the strong dependence of letter predictability on position within words and quantifies the entropy differences between initial and internal letters.
Findings
First letters are least predictable.
Entropy inside words is about four times smaller than for first letters.
Words act as well-defined subunits with weaker cross-unit correlations.
Abstract
We show that the predictability of letters in written English texts depends strongly on their position in the word. The first letters are usually the least easy to predict. This agrees with the intuitive notion that words are well defined subunits in written languages, with much weaker correlations across these units than within them. It implies that the average entropy of a letter deep inside a word is roughly 4 times smaller than the entropy of the first letter.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
