Machine Translation with Cross-lingual Word Embeddings

Marco Berlot; Evan Kaplan

arXiv:1912.10167·cs.CL·April 15, 2020

Machine Translation with Cross-lingual Word Embeddings

Marco Berlot, Evan Kaplan

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of cross-lingual word embeddings to create a shared semantic space for multiple languages, enabling better transfer learning and machine translation when data for some languages is limited.

Contribution

It introduces a method for learning a unified word embedding space across languages, facilitating cross-lingual tasks and improving translation quality.

Findings

01

Cross-lingual embeddings align semantically similar words across languages

02

Shared representations enable transfer learning in low-resource languages

03

Potential improvements in machine translation accuracy

Abstract

Learning word embeddings using distributional information is a task that has been studied by many researchers, and a lot of studies are reported in the literature. On the contrary, less studies were done for the case of multiple languages. The idea is to focus on a single representation for a pair of languages such that semantically similar words are closer to one another in the induced representation irrespective of the language. In this way, when data are missing for a particular language, classifiers from another language can be used.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MarcoBerlot/Languages_for_Machine_Translation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques