Are all negatives created equal in contrastive instance discrimination?
Tiffany Tianhui Cai, Jonathan Frankle, David J. Schwab, and Ari S., Morcos

TL;DR
This paper investigates the importance of negative samples in contrastive instance discrimination for self-supervised learning, revealing that only a small subset of hard negatives are necessary for optimal representation learning.
Contribution
The study demonstrates that focusing on a small, most challenging subset of negatives suffices for effective learning, suggesting potential improvements for contrastive learning methods.
Findings
Hard negatives are necessary and sufficient for high accuracy.
Easiest negatives are unnecessary and often unhelpful.
Very hardest negatives can be detrimental to learning.
Abstract
Self-supervised learning has recently begun to rival supervised learning on computer vision tasks. Many of the recent approaches have been based on contrastive instance discrimination (CID), in which the network is trained to recognize two augmented versions of the same instance (a query and positive) while discriminating against a pool of other instances (negatives). The learned representation is then used on downstream tasks such as image classification. Using methodology from MoCo v2 (Chen et al., 2020), we divided negatives by their difficulty for a given query and studied which difficulty ranges were most important for learning useful representations. We found a minority of negatives -- the hardest 5% -- were both necessary and sufficient for the downstream task to reach nearly full accuracy. Conversely, the easiest 95% of negatives were unnecessary and insufficient. Moreover, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCategorization, perception, and language
MethodsDense Connections · Random Gaussian Blur · Feedforward Network · Batch Normalization · Momentum Contrast · InfoNCE · MoCo v2
