A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters
Shasvat Desai, Debasmita Ghose, Deep Chakraborty

TL;DR
This survey reviews data curation techniques for positive and negative pair selection in visual contrastive learning, emphasizing their importance for improving representation quality and training efficiency.
Contribution
It provides a comprehensive taxonomy and detailed analysis of existing methods for data curation in contrastive learning.
Findings
Effective pair curation enhances representation quality
Proper data curation accelerates training convergence
Taxonomy aids in understanding and developing new curation techniques
Abstract
Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Cell Image Analysis Techniques · Generative Adversarial Networks and Image Synthesis
MethodsContrastive Learning · Sparse Evolutionary Training
