Exploiting Similarities among Languages for Machine Translation

Tomas Mikolov; Quoc V. Le; Ilya Sutskever

arXiv:1309.4168·cs.CL·September 18, 2013·1.4k cites

Exploiting Similarities among Languages for Machine Translation

Tomas Mikolov, Quoc V. Le, Ilya Sutskever

PDF

Open Access 5 Repos

TL;DR

This paper presents a simple yet effective method for automating dictionary and phrase table generation in machine translation by leveraging language similarities through distributed representations and linear mappings, achieving high accuracy.

Contribution

It introduces a language-agnostic approach that learns cross-lingual mappings from monolingual data, improving translation resources with minimal bilingual data.

Findings

01

Achieves nearly 90% precision@5 for English-Spanish word translation

02

Effective across diverse language pairs with minimal assumptions

03

Automates extension of translation dictionaries and phrase tables

Abstract

Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It uses distributed representation of words and learns a linear mapping between vector spaces of languages. Despite its simplicity, our method is surprisingly effective: we can achieve almost 90% precision@5 for translation of words between English and Spanish. This method makes little assumption about the languages, so it can be used to extend and refine dictionaries and translation tables for any language pairs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies