A Survey on Data Curation for Visual Contrastive Learning: Why Crafting   Effective Positive and Negative Pairs Matters

Shasvat Desai; Debasmita Ghose; Deep Chakraborty

arXiv:2502.08134·cs.CV·February 13, 2025

A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters

Shasvat Desai, Debasmita Ghose, Deep Chakraborty

PDF

Open Access

TL;DR

This survey reviews data curation techniques for positive and negative pair selection in visual contrastive learning, emphasizing their importance for improving representation quality and training efficiency.

Contribution

It provides a comprehensive taxonomy and detailed analysis of existing methods for data curation in contrastive learning.

Findings

01

Effective pair curation enhances representation quality

02

Proper data curation accelerates training convergence

03

Taxonomy aids in understanding and developing new curation techniques

Abstract

Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Cell Image Analysis Techniques · Generative Adversarial Networks and Image Synthesis

MethodsContrastive Learning · Sparse Evolutionary Training