Continuous multilinguality with language vectors
Robert \"Ostling, J\"org Tiedemann

TL;DR
This paper introduces continuous language vectors for multilingual NLP, enabling models to handle multiple languages and varieties more effectively, and demonstrates their ability to capture linguistic relationships.
Contribution
The paper proposes a novel approach using continuous language vectors learned via a character-based neural model, improving multilingual inference and capturing language relationships.
Findings
Language vectors improve inference on unseen language varieties.
Vectors capture genetic relationships between languages.
Efficient learning with a character-based neural model.
Abstract
Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages, we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
