IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

Kelly Marchisio; Neha Verma; Kevin Duh; Philipp Koehn

arXiv:2210.05098·cs.CL·July 6, 2023

IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn

PDF

Open Access 1 Repo

TL;DR

IsoVec introduces a method to enhance the geometric similarity of monolingual word embedding spaces by integrating isomorphism measures into the training process, leading to better cross-lingual mapping and bilingual lexicon induction.

Contribution

The paper presents a novel approach that incorporates global isomorphism measures into the Skip-gram loss to produce more isomorphic embedding spaces for improved cross-lingual tasks.

Findings

01

Improved bilingual lexicon induction across various conditions

02

Enhanced isomorphism of embedding spaces through the proposed method

03

Better cross-lingual mapping performance with IsoVec

Abstract

The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism." We address the root-cause of faulty cross-lingual mapping: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into the Skip-gram loss function, successfully increasing the relative isomorphism of trained word embedding spaces and improving their ability to be mapped to a shared cross-lingual space. The result is improved bilingual lexicon induction in general data conditions, under domain mismatch, and with training algorithm dissimilarities. We release IsoVec at https://github.com/kellymarchisio/isovec.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kellymarchisio/isovec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling