Explaining Zipf's Law via Mental Lexicon
Armen E. Allahverdyan, Weibing Deng, and Q. A. Wang

TL;DR
This paper demonstrates that Zipf's law in language can be derived from Bayesian models linking word probabilities to the mental lexicon, explaining its universality and variations across texts.
Contribution
It introduces a Bayesian framework connecting the mental lexicon to Zipf's law, providing a novel explanation for the law's emergence in language.
Findings
Zipf's law can be derived from assumptions about random word probabilities.
The model explains the law's applicability to single texts and variations in frequency.
The mental lexicon influences the statistical distribution of words in texts.
Abstract
The Zipf's law is the major regularity of statistical linguistics that served as a prototype for rank-frequency relations and scaling laws in natural sciences. Here we show that the Zipf's law -- together with its applicability for a single text and its generalizations to high and low frequencies including hapax legomena -- can be derived from assuming that the words are drawn into the text with random probabilities. Their apriori density relates, via the Bayesian statistics, to general features of the mental lexicon of the author who produced the text.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
