Compression and the origins of Zipf's law for word frequencies

Ramon Ferrer-i-Cancho

arXiv:1605.01326·cs.CL·September 24, 2020

Compression and the origins of Zipf's law for word frequencies

Ramon Ferrer-i-Cancho

PDF

TL;DR

This paper presents a new derivation of Zipf's law for word frequencies based on optimal coding, linking linguistic patterns to cognitive pressures for efficient communication.

Contribution

It introduces a realistic, parameter-free derivation of Zipf's law that explains its origin through cognitive and communicative pressures, unlike previous models.

Findings

01

Zipf's law can be derived from optimal coding principles.

02

The model does not require fine-tuning of parameters.

03

It suggests linguistic laws originate from cognitive pressures for efficient communication.

Abstract

Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures (2) it does not require fine tuning of parameters and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.