Analyzing the Limitations of Cross-lingual Word Embedding Mappings
Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko, Agirre

TL;DR
This paper investigates the limitations of offline cross-lingual word embedding mapping methods and demonstrates that joint learning approaches can produce more isomorphic embeddings and better bilingual lexicon induction results.
Contribution
It compares offline mapping with joint learning methods using parallel corpora, revealing the limitations of current mapping approaches and advocating for joint learning to improve cross-lingual embeddings.
Findings
Joint learning yields more isomorphic embeddings.
Joint learning reduces hubness sensitivity.
Joint learning improves bilingual lexicon induction results.
Abstract
Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations. While several authors have questioned the underlying isomorphism assumption, which states that word embeddings in different languages have approximately the same structure, it is not clear whether this is an inherent limitation of mapping approaches or a more general issue when learning cross-lingual embeddings. So as to answer this question, we experiment with parallel corpora, which allows us to compare offline mapping to an extension of skip-gram that jointly learns both embedding spaces. We observe that, under these ideal conditions, joint learning yields to more isomorphic embeddings, is less sensitive to hubness, and obtains stronger results in bilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
