Compression and the origins of Zipf's law for word frequencies
Ramon Ferrer-i-Cancho

TL;DR
This paper presents a new derivation of Zipf's law for word frequencies based on optimal coding, linking linguistic patterns to cognitive pressures for efficient communication.
Contribution
It introduces a realistic, parameter-free derivation of Zipf's law that explains its origin through cognitive and communicative pressures, unlike previous models.
Findings
Zipf's law can be derived from optimal coding principles.
The model does not require fine-tuning of parameters.
It suggests linguistic laws originate from cognitive pressures for efficient communication.
Abstract
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures (2) it does not require fine tuning of parameters and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
