Word Translation Without Parallel Data
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic, Denoyer, Herv\'e J\'egou

TL;DR
This paper presents an unsupervised method for building bilingual dictionaries by aligning monolingual word embeddings without parallel data, outperforming some supervised methods and working well across diverse language pairs.
Contribution
The authors introduce a novel unsupervised approach to create bilingual dictionaries without parallel corpora, extending applicability to distant and low-resource language pairs.
Findings
Outperforms supervised methods on certain language pairs
Works effectively for distant language pairs like English-Russian and English-Chinese
Demonstrates potential for unsupervised machine translation in low-resource settings
Abstract
State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
