Conditional Negative Sampling for Contrastive Learning of Visual Representations
Mike Wu, Milan Mosse, Chengxu Zhuang, Daniel Yamins, Noah Goodman

TL;DR
This paper introduces a conditional negative sampling method for contrastive learning that improves visual representation quality by selecting harder negatives, leading to better performance across multiple datasets and tasks.
Contribution
It proposes a family of mutual information estimators with conditional negative sampling that enhances contrastive learning effectiveness.
Findings
Improves accuracy by 2-5% on standard image datasets.
Enhances transferability of features to new image distributions.
Boosts performance in downstream tasks like detection and segmentation.
Abstract
Recent methods for learning unsupervised visual representations, dubbed contrastive learning, optimize the noise-contrastive estimation (NCE) bound on mutual information between two views of an image. NCE uses randomly sampled negative examples to normalize the objective. In this paper, we show that choosing difficult negatives, or those more similar to the current instance, can yield stronger representations. To do this, we introduce a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive. We prove that these estimators lower-bound mutual information, with higher bias but lower variance than NCE. Experimentally, we find our approach, applied on top of existing models (IR, CMC, and MoCo) improves accuracy by 2-5% points in each case, measured by linear evaluation on four standard image datasets. Moreover, we find continued…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
