TL;DR
This paper introduces a new information-theoretic measure, the sample Rényi entropy, to quantify homophony in languages and reexamines existing hypotheses, finding no clear evidence for or against homophony's prevalence.
Contribution
It proposes the sample Rényi entropy as a novel metric for language homophony and critically reassesses prior claims, addressing methodological issues in previous studies.
Findings
No clear pressure towards homophony after methodological correction
Reevaluation challenges previous theories favoring homophony's prevalence
Provides a nuanced view of homophony's role in language
Abstract
Homophony's widespread presence in natural languages is a controversial topic. Recent theories of language optimality have tried to justify its prevalence, despite its negative effects on cognitive processing time; e.g., Piantadosi et al. (2012) argued homophony enables the reuse of efficient wordforms and is thus beneficial for languages. This hypothesis has recently been challenged by Trott and Bergen (2020), who posit that good wordforms are more often homophonous simply because they are more phonotactically probable. In this paper, we join in on the debate. We first propose a new information-theoretic quantification of a language's homophony: the sample R\'enyi entropy. Then, we use this quantification to revisit Trott and Bergen's claims. While their point is theoretically sound, a specific methodological issue in their experiments raises doubts about their results. After…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
