Two Universality Properties Associated with the Monkey Model of Zipf's Law
Richard Perline, Ronald Perline

TL;DR
This paper demonstrates two universal properties of the monkey model for Zipf's law: the power law exponent approaches -1 with increasing alphabet size, and the distribution becomes approximately normal on a log scale for finite word lengths.
Contribution
It establishes the universality of the power law exponent and the normal approximation in the monkey model using general limit theorems, connecting Zipf's law to broader probabilistic principles.
Findings
Power law exponent converges to -1 as alphabet size grows.
Distribution is approximately normal on a log scale for finite word lengths.
Finite word length model yields a Zipf-lognormal mixture distribution.
Abstract
The distribution of word probabilities in the monkey model of Zipf's law is associated with two universality properties: (1) the power law exponent converges strongly to as the alphabet size increases and the letter probabilities are specified as the spacings from a random division of the unit interval for any distribution with a bounded density function on ; and (2), on a logarithmic scale the version of the model with a finite word length cutoff and unequal letter probabilities is approximately normally distributed in the part of the distribution away from the tails. The first property is proved using a remarkably general limit theorem for the logarithm of sample spacings from Shao and Hahn, and the second property follows from Anscombe's central limit theorem for a random number of i.i.d. random variables. The finite word length model leads to a hybrid Zipf-lognormal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
