The Zipf law for random texts with unequal probabilities of occurrence of letters and the Pascal pyramid
V.V. Bochkarev, E.Yu. Lerner

TL;DR
This paper analyzes the distribution of word probabilities generated by independent letters with unequal probabilities, proving a power law asymptotic and providing an explicit exponent, with a simpler proof than previous work.
Contribution
It offers a new, elementary proof of the power law behavior and derives an explicit formula for the exponent, improving on prior results.
Findings
Probability of words follows a power law asymptotic.
Explicit formula for the power law exponent is derived.
Simplified proof method compared to previous work.
Abstract
We model the generation of words with independent unequal probabilities of occurrence of letters. We prove that the probability of occurrence of words of rank has a power asymptotics. As distinct from the paper published earlier by B. Conrad and M. Mitzenmacher, we give a brief proof by elementary methods and obtain an explicit formula for the exponent of the power law.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
