Tracking Typological Traits of Uralic Languages in Distributed Language Representations
Johannes Bjerva, Isabelle Augenstein

TL;DR
This paper investigates whether distributed language representations effectively encode typological features of Uralic languages and can improve model transfer in neural networks.
Contribution
It demonstrates that typological traits of Uralic languages can be automatically inferred from distributed representations, revealing their potential for linguistic analysis.
Findings
Some typological features are accurately predicted from representations
Language representations improve model transfer between Uralic languages
Typological traits are encoded at various stages of fine-tuning
Abstract
Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed representations in computational linguistics has also become increasingly popular. A recent development is to learn distributed representations of language, such that typologically similar languages are spatially close to one another. Although empirical successes have been shown for such language representations, they have not been subjected to much typological probing. In this paper, we first look at whether this type of language representations are empirically useful for model transfer between Uralic languages in deep neural networks. We then investigate which typological features are encoded in these representations by attempting to predict features in the World Atlas of Language Structures, at various stages of fine-tuning of the representations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
