Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists
Julius Steuer, Badr Abdullah, Johann-Mattis List, Dietrich, Klakow

TL;DR
This study introduces an information-theoretic measure of vowel harmony using phoneme-level language models trained on small, cross-linguistic word lists, enabling analysis of under-studied languages and demonstrating the effectiveness of limited data.
Contribution
It presents a novel, data-driven approach to quantify vowel harmony across languages using minimal, lemma-based word lists and neural language models, expanding research to low-resource languages.
Findings
Neural PLMs effectively capture vowel harmony patterns.
Word lists are valuable resources for typological research.
Method enables analysis of under-studied languages with limited data.
Abstract
We present a cross-linguistic study that aims to quantify vowel harmony using data-driven computational modeling. Concretely, we define an information-theoretic measure of harmonicity based on the predictability of vowels in a natural language lexicon, which we estimate using phoneme-level language models (PLMs). Prior quantitative studies have relied heavily on inflected word-forms in the analysis of vowel harmony. We instead train our models using cross-linguistically comparable lemma forms with little or no inflection, which enables us to cover more under-studied languages. Training data for our PLMs consists of word lists with a maximum of 1000 entries per language. Despite the fact that the data we employ are substantially smaller than previously used corpora, our experiments demonstrate the neural PLMs capture vowel harmony patterns in a set of languages that exhibit this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Phonetics and Phonology Research
