Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring
Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko, Agirre

TL;DR
This paper introduces a novel method for learning cross-lingual word embeddings that overcomes the limitations of traditional mapping approaches by fixing target embeddings and learning source embeddings through context anchoring, achieving better bilingual lexicon induction and XNLI performance.
Contribution
It proposes a new approach that learns source language embeddings aligned with fixed target embeddings using translated context words, reducing reliance on similar embedding structures.
Findings
Outperforms traditional mapping methods in bilingual lexicon induction.
Achieves competitive results on the XNLI downstream task.
Requires only a weak seed dictionary for supervision.
Abstract
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings. Such methods critically rely on those embeddings having a similar structure, but it was recently shown that the separate training in different languages causes departures from this assumption. In this paper, we propose an alternative approach that does not have this limitation, while requiring a weak seed dictionary (e.g., a list of identical words) as the only form of supervision. Rather than aligning two fixed embedding spaces, our method works by fixing the target language embeddings, and learning a new set of embeddings for the source language that are aligned with them. To that end, we use an extension of skip-gram that leverages translated context words as anchor points, and incorporates self-learning and iterative restarts to reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSelf-Learning
