Closed Form Word Embedding Alignment

Sunipa Dev; Safia Hassan; Jeff M. Phillips

arXiv:1806.01330·cs.CL·November 19, 2020

Closed Form Word Embedding Alignment

Sunipa Dev, Safia Hassan, Jeff M. Phillips

PDF

TL;DR

This paper introduces a set of simple, closed-form techniques for aligning word embeddings from different sources or mechanisms, enabling better comparison and combination of embeddings while preserving semantic properties.

Contribution

It extends the Absolute Orientation approach to word embeddings, providing new theoretical results and practical methods for optimal alignment, scaling, and similarity maximization.

Findings

01

Alignments improve the preservation of semantic properties like synonyms and analogies.

02

Ensembling aligned embeddings enhances their semantic quality.

03

The methods are computationally efficient and theoretically grounded.

Abstract

We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same dimensional space. Our methods extend approaches known as Absolute Orientation, which are popular for aligning objects in three-dimensions, and generalize an approach by Smith etal (ICLR 2017). We prove new results for optimal scaling and for maximizing cosine similarity. Then we demonstrate how to evaluate the similarity of embeddings from different sources or mechanisms, and that certain properties like synonyms and analogies are preserved across the embeddings and can be enhanced by simply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGloVe Embeddings