On the class of coding optimality of human languages and the origins of Zipf's law
Ramon Ferrer-i-Cancho

TL;DR
This paper introduces a new class of coding optimality that explains Zipf's law in human languages and some animal communication systems, linking frequency distributions to coding efficiency and optimality.
Contribution
It defines a novel class of optimality for coding systems, connecting Zipf's law, size-rank, and size-probability laws, and explores their implications for human and animal communication.
Findings
Human languages fit the new optimality class.
Some animal communication systems exhibit exponential distributions.
Straight lines in log-log plots indicate near-optimal coding.
Abstract
Here we present a new class of optimality for coding systems. Members of that class are displaced linearly from optimal coding and thus exhibit Zipf's law, namely a power-law distribution of frequency ranks. Within that class, Zipf's law, the size-rank law and the size-probability law form a group-like structure. We identify human languages that are members of the class. All languages showing sufficient agreement with Zipf's law are potential members of the class. In contrast, there are communication systems in other species that cannot be members of that class for exhibiting an exponential distribution instead but dolphins and humpback whales might. We provide a new insight into plots of frequency versus rank in double logarithmic scale. For any system, a straight line in that scale indicates that the lengths of optimal codes under non-singular coding and under uniquely decodable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory
