Word2vec Conjecture and A Limitative Result
Falcon Z. Dai

TL;DR
This paper investigates whether all semantic word relations can be represented by vector differences in word embeddings, ultimately disproving the conjecture and establishing fundamental limits of such representations.
Contribution
It introduces the word2vec conjecture on semantic relation representation and provides a formal proof that some relations cannot be captured by vector differences.
Findings
The word2vec conjecture is false for certain semantic relations.
Not all semantic relations can be represented by vector differences.
The paper establishes theoretical limits of vector space models for semantics.
Abstract
Being inspired by the success of \texttt{word2vec} \citep{mikolov2013distributed} in capturing analogies, we study the conjecture that analogical relations can be represented by vector spaces. Unlike many previous works that focus on the distributional semantic aspect of \texttt{word2vec}, we study the purely \emph{representational} question: can \emph{all} semantic word-word relations be represented by differences (or directions) of vectors? We call this the word2vec conjecture and point out some of its desirable implications. However, we will exhibit a class of relations that cannot be represented in this way, thus falsifying the conjecture and establishing a limitative result for the representability of semantic relations by vector spaces over fields of characteristic 0, e.g., real or complex numbers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Natural Language Processing Techniques · Topic Modeling
