Word reuse and combination support efficient communication of emerging concepts
Aotao Xu, Charles Kemp, Lea Frermann, Yang Xu

TL;DR
This paper presents an information-theoretic framework explaining how word reuse and combination facilitate efficient communication of new concepts, supported by historical linguistic data across multiple languages.
Contribution
It introduces a unified account of lexicalization strategies based on a tradeoff between word length and informativeness, validated with cross-linguistic historical data.
Findings
Emerging words are more efficient than hypothetical alternatives.
Literal reuse and compounds are more efficient than non-literal forms.
Both strategies align with principles of efficient communication.
Abstract
A key function of the lexicon is to express novel concepts as they emerge over time through a process known as lexicalization. The most common lexicalization strategies are the reuse and combination of existing words, but they have typically been studied separately in the areas of word meaning extension and word formation. Here we offer an information-theoretic account of how both strategies are constrained by a fundamental tradeoff between competing communicative pressures: word reuse tends to preserve the average length of word forms at the cost of less precision, while word combination tends to produce more informative words at the expense of greater word length. We test our proposal against a large dataset of reuse items and compounds that appeared in English, French and Finnish over the past century. We find that these historically emerging items achieve higher levels of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
