A Probabilistic Generative Model of Linguistic Typology
Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell and, Isabelle Augenstein

TL;DR
This paper introduces a probabilistic generative model for linguistic typology that leverages language embeddings and covariance structures to accurately predict typological features and reveal correlations between features and languages.
Contribution
It develops a novel exponential-family matrix factorisation model that captures feature covariance and demonstrates improved prediction of typological features over baselines.
Findings
High accuracy in predicting held-out features
Language embeddings enable generalisation to unseen languages
Correlations between typological features and languages are confirmed
Abstract
In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquiry---we develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting held-out features. Furthermore, we show that language embeddings pre-trained on monolingual text allow for generalisation to unobserved languages. This finding has clear practical and also theoretical implications: the results confirm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
