Unsupervised Cross-lingual Word Embedding by Multilingual Neural   Language Models

Takashi Wada; Tomoharu Iwata

arXiv:1809.02306·cs.CL·September 10, 2018·19 cites

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

Takashi Wada, Tomoharu Iwata

PDF

Open Access

TL;DR

This paper introduces an unsupervised approach using multilingual neural language models with shared LSTMs to generate high-quality cross-lingual word embeddings without parallel data, effective even with limited monolingual data.

Contribution

It presents a novel unsupervised method employing shared bidirectional LSTMs for cross-lingual embedding learning, outperforming existing models in low-resource and domain-diverse scenarios.

Findings

01

Outperforms existing unsupervised models in word alignment tasks.

02

Effective with only 50k sentences of monolingual data.

03

Handles domain differences across languages successfully.

Abstract

We propose an unsupervised method to obtain cross-lingual embeddings without any parallel data or pre-trained word embeddings. The proposed model, which we call multilingual neural language models, takes sentences of multiple languages as an input. The proposed model contains bidirectional LSTMs that perform as forward and backward language models, and these networks are shared among all the languages. The other parameters, i.e. word embeddings and linear transformation between hidden states and outputs, are specific to each language. The shared LSTMs can capture the common sentence structure among all languages. Accordingly, word embeddings of each language are mapped into a common latent space, making it possible to measure the similarity of words across multiple languages. We evaluate the quality of the cross-lingual word embeddings on a word alignment task. Our experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification