Log-log Convexity of Type-Token Growth in Zipf's Systems
Francesc Font-Clos, Alvaro Corral

TL;DR
This paper demonstrates that traditional assumptions linking Zipf's law to Heaps' law are flawed, revealing new growth patterns in systems and explaining why real texts deviate from classical models.
Contribution
It introduces a revised understanding of Zipf's law's implications, showing violations of Heaps' law and proposing universal growth curves based on Zipf's exponent.
Findings
Real books follow similar growth patterns as random systems despite burstiness.
Alternative growth curves depend only on Zipf's exponent.
Universal data collapses characterize the growth in Zipf's systems.
Abstract
It is traditionally assumed that Zipf's law implies the power-law growth of the number of different elements with the total number of elements in a system - the so-called Heaps' law. We show that a careful definition of Zipf's law leads to the violation of Heaps' law in random systems, and obtain alternative growth curves. These curves fulfill universal data collapses that only depend on the value of the Zipf's exponent. We observe that real books behave very much in the same way as random systems, despite the presence of burstiness in word occurrence. We advance an explanation for this unexpected correspondence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
