Notes on Noise Contrastive Estimation and Negative Sampling
Chris Dyer

TL;DR
This paper clarifies the relationship between noise contrastive estimation and negative sampling, showing that NCE is a general unbiased estimation method, whereas negative sampling is mainly useful for word representation learning.
Contribution
It provides a detailed analysis distinguishing NCE from negative sampling, clarifying their theoretical foundations and appropriate applications.
Findings
NCE is a general, asymptotically unbiased parameter estimation technique.
Negative sampling functions as a binary classification method for word embeddings.
NCE and negative sampling are related but serve different purposes in language modeling.
Abstract
Estimating the parameters of probabilistic models of language such as maxent models and probabilistic neural models is computationally difficult since it involves evaluating partition functions by summing over an entire vocabulary, which may be millions of word types in size. Two closely related strategies---noise contrastive estimation (Mnih and Teh, 2012; Mnih and Kavukcuoglu, 2013; Vaswani et al., 2013) and negative sampling (Mikolov et al., 2012; Goldberg and Levy, 2014)---have emerged as popular solutions to this computational problem, but some confusion remains as to which is more appropriate and when. This document explicates their relationships to each other and to other estimation techniques. The analysis shows that, although they are superficially similar, NCE is a general parameter estimation technique that is asymptotically unbiased, while negative sampling is best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Bayesian Methods and Mixture Models · Target Tracking and Data Fusion in Sensor Networks
