Towards Understanding Linear Word Analogies

Kawin Ethayarajh; David Duvenaud; Graeme Hirst

arXiv:1810.04882·cs.CL·August 13, 2019

Towards Understanding Linear Word Analogies

Kawin Ethayarajh, David Duvenaud, Graeme Hirst

PDF

TL;DR

This paper explains why linear algebraic operations on word vectors effectively solve analogies, providing a formal theory that connects vector arithmetic with the underlying properties of word embedding models like SGNS.

Contribution

It offers a formal explanation for the linear structure in word embeddings, linking it to the properties of SGNS and providing theoretical justification for vector addition and distance measures.

Findings

01

Proves that relations in SGNS embeddings can be represented as ratios.

02

Shows that vector addition naturally down-weights frequent words.

03

Provides an information theoretic justification for Euclidean distance in word vectors.

Abstract

A surprising property of word vectors is that word analogies can often be solved with vector arithmetic. However, it is unclear why arithmetic operators correspond to non-linear embedding models such as skip-gram with negative sampling (SGNS). We provide a formal explanation of this phenomenon without making the strong assumptions that past theories have made about the vector space and word distribution. Our theory has several implications. Past work has conjectured that linear substructures exist in vector spaces because relations can be represented as ratios; we prove that this holds for SGNS. We provide novel justification for the addition of SGNS word vectors by showing that it automatically down-weights the more frequent word, as weighting schemes do ad hoc. Lastly, we offer an information theoretic interpretation of Euclidean distance in vector spaces, justifying its use in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.