Word2vec Conjecture and A Limitative Result

Falcon Z. Dai

arXiv:2010.12719·cs.CL·October 27, 2020·1 cites

Word2vec Conjecture and A Limitative Result

Falcon Z. Dai

PDF

Open Access

TL;DR

This paper investigates whether all semantic word relations can be represented by vector differences in word embeddings, ultimately disproving the conjecture and establishing fundamental limits of such representations.

Contribution

It introduces the word2vec conjecture on semantic relation representation and provides a formal proof that some relations cannot be captured by vector differences.

Findings

01

The word2vec conjecture is false for certain semantic relations.

02

Not all semantic relations can be represented by vector differences.

03

The paper establishes theoretical limits of vector space models for semantics.

Abstract

Being inspired by the success of \texttt{word2vec} \citep{mikolov2013distributed} in capturing analogies, we study the conjecture that analogical relations can be represented by vector spaces. Unlike many previous works that focus on the distributional semantic aspect of \texttt{word2vec}, we study the purely \emph{representational} question: can \emph{all} semantic word-word relations be represented by differences (or directions) of vectors? We call this the word2vec conjecture and point out some of its desirable implications. However, we will exhibit a class of relations that cannot be represented in this way, thus falsifying the conjecture and establishing a limitative result for the representability of semantic relations by vector spaces over fields of characteristic 0, e.g., real or complex numbers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Natural Language Processing Techniques · Topic Modeling