Synthetic Data Can Also Teach: Synthesizing Effective Data for Unsupervised Visual Representation Learning
Yawen Wu, Zhepeng Wang, Dewen Zeng, Yiyu Shi, Jingtong Hu

TL;DR
This paper introduces a novel data generation framework that enhances contrastive learning by creating synthetic hard samples and positive pairs, leading to improved accuracy and data efficiency in visual representation learning.
Contribution
The paper presents a joint sample generation and contrastive learning framework that dynamically generates hard samples and positive pairs, significantly improving CL performance.
Findings
Achieved up to 4.0% accuracy improvement on ImageNet-100
Up to 2x data efficiency for linear classification
Enhanced transfer learning performance by up to 5x
Abstract
Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled data. Given the CL training data, generative models can be trained to generate synthetic data to supplement the real data. Using both synthetic and real data for CL training has the potential to improve the quality of learned representations. However, synthetic data usually has lower quality than real data, and using synthetic data may not improve CL compared with using real data. To tackle this problem, we propose a data generation framework with two methods to improve CL training by joint sample generation and contrastive learning. The first approach generates hard samples for the main model. The generator is jointly learned with the main model to dynamically customize hard samples based on the training state of the main model. Besides, a pair of data generators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
