Anchor-based Bilingual Word Embeddings for Low-Resource Languages
Tobias Eder, Viktor Hangya, Alexander Fraser

TL;DR
This paper introduces an anchor-based method for creating bilingual word embeddings that leverages high-resource language vectors to improve low-resource language embeddings and bilingual tasks.
Contribution
It proposes a novel approach using source language vectors as anchors to automatically align bilingual embeddings during training for low-resource languages.
Findings
Improved bilingual lexicon induction performance.
Enhanced monolingual word similarity in low-resource languages.
Effective alignment of bilingual embedding spaces.
Abstract
Good quality monolingual word embeddings (MWEs) can be built for languages which have large amounts of unlabeled text. MWEs can be aligned to bilingual spaces using only a few thousand word translation pairs. For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well. This paper proposes a new approach for building BWEs in which the vector space of the high resource source language is used as a starting point for training an embedding space for the low resource target language. By using the source vectors as anchors the vector spaces are automatically aligned during training. We experiment on English-German, English-Hiligaynon and English-Macedonian. We show that our approach results not only in improved BWEs and bilingual lexicon induction performance, but also in improved target language MWE quality as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
