Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning

Vicente Balmaseda; Bokun Wang; Ching-Long Lin; Tianbao Yang

arXiv:2502.20612·cs.LG·June 27, 2025

Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning

Vicente Balmaseda, Bokun Wang, Ching-Long Lin, Tianbao Yang

PDF

1 Repo

TL;DR

This paper introduces GloFND, a novel method for dynamically identifying false negatives in self-supervised contrastive learning, improving the quality of learned representations by globally detecting semantically similar negatives during training.

Contribution

GloFND is the first approach to globally detect false negatives across the entire dataset during contrastive learning, with per-iteration cost independent of dataset size.

Findings

01

GloFND effectively identifies false negatives in image and image-text datasets.

02

The method improves the quality of learned embeddings compared to baseline approaches.

03

Experimental results show enhanced downstream task performance.

Abstract

In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vibalcam/glofnd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.