Understanding Hard Negatives in Noise Contrastive Estimation

Wenzheng Zhang; Karl Stratos

arXiv:2104.06245·cs.CL·April 14, 2021·1 cites

Understanding Hard Negatives in Noise Contrastive Estimation

Wenzheng Zhang, Karl Stratos

PDF

Open Access 1 Repo

TL;DR

This paper provides a theoretical understanding of the effectiveness of hard negatives in noise contrastive estimation, demonstrating how their use reduces bias and improves zero-shot entity linking performance.

Contribution

It introduces analytical tools to justify the use of hard negatives and unifies different architectures through a general score function, advancing noise contrastive estimation theory.

Findings

01

Setting the negative distribution to the model distribution reduces bias.

02

Hard negatives combined with a unified score function improve zero-shot entity linking.

03

Theoretical and empirical evidence supports the bias reduction with hard negatives.

Abstract

The choice of negative examples is important in noise contrastive estimation. Recent works find that hard negatives -- highest-scoring incorrect examples under the model -- are effective in practice, but they are used without a formal justification. We develop analytical tools to understand the role of hard negatives. Specifically, we view the contrastive loss as a biased estimator of the gradient of the cross-entropy loss, and show both theoretically and empirically that setting the negative distribution to be the model distribution results in bias reduction. We also derive a general form of the score function that unifies various architectures used in text retrieval. By combining hard negatives with appropriate score functions, we obtain strong results on the challenging task of zero-shot entity linking.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WenzhengZhang/hard-nce-el
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis