Continuous multilinguality with language vectors

Robert \"Ostling; J\"org Tiedemann

arXiv:1612.07486·cs.CL·March 21, 2017·2 cites

Continuous multilinguality with language vectors

Robert \"Ostling, J\"org Tiedemann

PDF

Open Access

TL;DR

This paper introduces continuous language vectors for multilingual NLP, enabling models to handle multiple languages and varieties more effectively, and demonstrates their ability to capture linguistic relationships.

Contribution

The paper proposes a novel approach using continuous language vectors learned via a character-based neural model, improving multilingual inference and capturing language relationships.

Findings

01

Language vectors improve inference on unseen language varieties.

02

Vectors capture genetic relationships between languages.

03

Efficient learning with a character-based neural model.

Abstract

Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages, we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis