Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean

TL;DR
This paper improves the Skip-gram model for word embeddings by introducing subsampling and negative sampling, and extends it to learn phrase representations, capturing complex language relationships efficiently.
Contribution
The paper presents extensions to the Skip-gram model, including subsampling, negative sampling, and a method for learning phrase representations, enhancing quality and speed.
Findings
Subsampling speeds up training and improves vector regularity.
Negative sampling offers a simple alternative to hierarchical softmax.
The method successfully learns vector representations for millions of phrases.
Abstract
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
[Classic] Word2Vec: Distributed Representations of Words and Phrases and their Compositionality· youtube
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
MethodsHierarchical Softmax · Softmax
