Re-evaluating phoneme frequencies
Jayden L. Macklin-Cordes, Erich R. Round

TL;DR
This study re-evaluates phoneme frequency distributions in 166 Australian languages, confirming Zipf-like patterns among common phonemes and exponential patterns among rare ones, using rigorous statistical methods to deepen understanding of linguistic evolution.
Contribution
It applies maximum likelihood methods to reassess phoneme frequency distributions, revealing nuanced patterns and advancing the understanding of linguistic causal processes.
Findings
Zipfian-like distribution among most frequent phonemes
Exponential distribution among least frequent phonemes
Supports earlier Zipfian findings with refined analysis
Abstract
Causal processes can give rise to distinctive distributions in the linguistic variables that they affect. Consequently, a secure understanding of a variable's distribution can hold a key to understanding the forces that have causally shaped it. A storied distribution in linguistics has been Zipf's law, a kind of power law. In the wake of a major debate in the sciences around power-law hypotheses and the unreliability of earlier methods of evaluating them, here we re-evaluate the distributions claimed to characterize phoneme frequencies. We infer the fit of power laws and three alternative distributions to 166 Australian languages, using a maximum likelihood framework. We find evidence supporting earlier results, but also nuancing them and increasing our understanding of them. Most notably, phonemic inventories appear to have a Zipfian-like frequency structure among their most-frequent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
