Unsupervised Hyperalignment for Multilingual Word Embeddings

Jean Alaux; Edouard Grave; Marco Cuturi; Armand Joulin

arXiv:1811.01124·cs.CL·June 6, 2019·41 cites

Unsupervised Hyperalignment for Multilingual Word Embeddings

Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin

PDF

Open Access

TL;DR

This paper introduces an unsupervised method for aligning multiple languages' word embeddings into a common space, improving indirect translation quality through a novel composable mapping approach.

Contribution

It extends unsupervised hyperalignment from two languages to multiple languages with a new formulation ensuring composability of mappings.

Findings

01

Improved indirect translation accuracy across eleven languages.

02

Maintained competitive performance on direct word translation.

03

Demonstrated the effectiveness of the composable mapping approach.

Abstract

We consider the problem of aligning continuous word representations, learned in multiple languages, to a common space. It was recently shown that, in the case of two languages, it is possible to learn such a mapping without supervision. This paper extends this line of work to the problem of aligning multiple languages to a common space. A solution is to independently map all languages to a pivot language. Unfortunately, this degrades the quality of indirect word translation. We thus propose a novel formulation that ensures composable mappings, leading to better alignments. We evaluate our method by jointly aligning word vectors in eleven languages, showing consistent improvement with indirect mappings while maintaining competitive performance on direct word translation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis