Mapping Supervised Bilingual Word Embeddings from English to   low-resource languages

Sourav Dutta (1) ((1) Saarland University)

arXiv:1910.06411·cs.CL·October 16, 2019

Mapping Supervised Bilingual Word Embeddings from English to low-resource languages

Sourav Dutta (1) ((1) Saarland University)

PDF

Open Access 1 Repo

TL;DR

This paper explores mapping English and low-resource language embeddings into a shared space using supervised methods, enabling better NLP tasks like machine translation with limited bilingual data.

Contribution

It introduces a supervised approach for mapping bilingual embeddings in low-resource languages and discusses potential for unsupervised methods.

Findings

01

Supervised mapping achieves promising accuracy in bilingual retrieval tasks.

02

Bilingual data improves embedding alignment and translation quality.

03

Unsupervised approaches are viable when monolingual data is available.

Abstract

It is very challenging to work with low-resource languages due to the inadequate availability of data. Using a dictionary to map independently trained word embeddings into a shared vector space has proved to be very useful in learning bilingual embeddings in the past. Here we have tried to map individual embeddings of words in English and their corresponding translated words in low-resource languages like Estonian, Slovenian, Slovakian, and Hungarian. We have used a supervised learning approach. We report accuracy scores through various retrieval strategies which show that it is possible to approach challenging tasks in Natural Language Processing like machine translation for such languages, provided that we have at least some amount of proper bilingual data. We also conclude that we can follow an unsupervised learning path on monolingual text data as that is more suitable for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SouravDutta91/map-low-resource-embeddings
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling