Dynamics of text generation with realistic Zipf distribution
Dami\'an H. Zanette, Marcelo A. Montemurro

TL;DR
This paper presents a stochastic dynamical model explaining the origin of Zipf's law in written texts, demonstrating that multiplicative dynamics can produce realistic rank-frequency distributions aligned with empirical data.
Contribution
It introduces a novel dynamical model incorporating language structure and memory effects, providing a quantitative explanation for Zipf's law in human language.
Findings
Model reproduces empirical Zipf distributions
Supports linguistic relevance of Zipf's law
Highlights importance of multiplicative dynamics
Abstract
We investigate the origin of Zipf's law for words in written texts by means of a stochastic dynamical model for text generation. The model incorporates both features related to the general structure of languages and memory effects inherent to the production of long coherent messages in the communication process. It is shown that the multiplicative dynamics of our model leads to rank-frequency distributions in quantitative agreement with empirical data. Our results give support to the linguistic relevance of Zipf's law in human language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence · Authorship Attribution and Profiling · Advanced Text Analysis Techniques
