Morphological Irregularity Correlates with Frequency
Shijie Wu, Ryan Cotterell, Timothy J. O'Donnell

TL;DR
This study introduces an information-theoretic measure of morphological irregularity and demonstrates a strong correlation between irregularity and frequency across 28 languages, supporting linguistic theories about language structure.
Contribution
The paper develops a neural transduction-based method to quantify irregularity and provides the first broad empirical evidence linking irregularity with frequency in multiple languages.
Findings
Higher frequency items tend to be more irregular.
Irregular items are generally more frequent.
The correlation is stronger at the paradigm level.
Abstract
We present a study of morphological irregularity. Following recent work, we define an information-theoretic measure of irregularity based on the predictability of forms in a language. Using a neural transduction model, we estimate this quantity for the forms in 28 languages. We first present several validatory and exploratory analyses of irregularity. We then show that our analyses provide evidence for a correlation between irregularity and frequency: higher frequency items are more likely to be irregular and irregular items are more likely be highly frequent. To our knowledge, this result is the first of its breadth and confirms longstanding proposals from the linguistics literature. The correlation is more robust when aggregated at the level of whole paradigms--providing support for models of linguistic structure in which inflected forms are unified by abstract underlying stems or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Topic Modeling
