# Optimal coding and the origins of Zipfian laws

**Authors:** Ramon Ferrer-i-Cancho, Christian Bentz, Caio Seguin

arXiv: 1906.01545 · 2020-09-24

## TL;DR

This paper demonstrates that optimal coding principles can explain Zipf's law of abbreviation and rank-frequency distributions in language, challenging traditional views on random typing and offering a unified theoretical framework.

## Contribution

It introduces a novel application of optimal coding to linguistic laws, showing that Zipfian patterns naturally emerge from coding efficiency considerations.

## Key findings

- Optimal coding predicts Zipf's law of abbreviation.
- Non-singular coding predicts logarithmic growth of word length with rank.
- Random typing is shown to be an optimal coding process.

## Abstract

The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding -- under an arbitrary coding scheme -- and show that it predicts Zipf's law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf's law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf's rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws and other linguistic laws.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01545/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1906.01545/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/1906.01545/full.md

---
Source: https://tomesphere.com/paper/1906.01545