Word forms - not just their lengths- are optimized for efficient communication
Stephan C. Meylan, Thomas L. Griffiths

TL;DR
This paper shows that word forms are optimized for efficient communication not just by length but through their distinctiveness, which better predicts word frequency across multiple languages, reflecting processing constraints.
Contribution
It introduces phonological information content as a measure of word distinctiveness and demonstrates its superiority over length in predicting word frequency across 13 languages.
Findings
Distinctiveness outperforms length in predicting frequency
Phonological information content captures word distinctiveness effectively
Cross-linguistic evidence supports processing constraints shaping word forms
Abstract
The inverse relationship between the length of a word and the frequency of its use, first identified by G.K. Zipf in 1935, is a classic empirical law that holds across a wide range of human languages. We demonstrate that length is one aspect of a much more general property of words: how distinctive they are with respect to other words in a language. Distinctiveness plays a critical role in recognizing words in fluent speech, in that it reflects the strength of potential competitors when selecting the best candidate for an ambiguous signal. Phonological information content, a measure of a word's string probability under a statistical model of a language's sound or character sequences, concisely captures distinctiveness. Examining large-scale corpora from 13 languages, we find that distinctiveness significantly outperforms word length as a predictor of frequency. This finding provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Authorship Attribution and Profiling
