IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces
Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn

TL;DR
IsoVec introduces a method to enhance the geometric similarity of monolingual word embedding spaces by integrating isomorphism measures into the training process, leading to better cross-lingual mapping and bilingual lexicon induction.
Contribution
The paper presents a novel approach that incorporates global isomorphism measures into the Skip-gram loss to produce more isomorphic embedding spaces for improved cross-lingual tasks.
Findings
Improved bilingual lexicon induction across various conditions
Enhanced isomorphism of embedding spaces through the proposed method
Better cross-lingual mapping performance with IsoVec
Abstract
The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism." We address the root-cause of faulty cross-lingual mapping: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into the Skip-gram loss function, successfully increasing the relative isomorphism of trained word embedding spaces and improving their ability to be mapped to a shared cross-lingual space. The result is improved bilingual lexicon induction in general data conditions, under domain mismatch, and with training algorithm dissimilarities. We release IsoVec at https://github.com/kellymarchisio/isovec.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
