Do Generated Data Always Help Contrastive Learning?
Yifei Wang, Jizhe Zhang, Yisen Wang

TL;DR
This paper investigates when generated data helps contrastive learning, revealing that balancing data inflation and augmentation is crucial, and introduces AdaInf, a strategy that improves performance without extra computation.
Contribution
It uncovers the interplay between data inflation and augmentation in contrastive learning and proposes AdaInf, a data-centric method that enhances performance without additional costs.
Findings
AdaInf improves contrastive learning performance.
Achieves 94.70% accuracy on CIFAR-10 with SimCLR.
Theoretical analysis explains when generated data is beneficial.
Abstract
Contrastive Learning (CL) has emerged as one of the most successful paradigms for unsupervised visual representation learning, yet it often depends on intensive manual data augmentations. With the rise of generative models, especially diffusion models, the ability to generate realistic images close to the real data distribution has been well recognized. These generated high-equality images have been successfully applied to enhance contrastive representation learning, a technique termed ``data inflation''. However, we find that the generated data (even from a good diffusion model like DDPM) may sometimes even harm contrastive learning. We investigate the causes behind this failure from the perspective of both data inflation and data augmentation. For the first time, we reveal the complementary roles that stronger data inflation should be accompanied by weaker augmentations, and vice…
Peer Reviews
Decision·ICLR 2024 poster
The insights and results provided by this paper seem novel, creative and significant. I believe the results would greatly help in our understanding of contrastive learning both theoretically and empirically. The paper is very well-written in general except some proofs in the appendix which I will describe in detail in the weaknesses section.
The proposed Adainf method seems to rely on the downstream task to find the optimal weighing factor and augmentation strength. This might limit it's use in practice since the goal of self supervised learning is to learn a good representation from training data that can be useful in future downstream tasks, I do not know a priori. Some discussion on this would be useful, either as a limitation or clarifying the exposition in case I misunderstood something. Eq (7) in proof of theorem 4.1: I think
The paper is generally well written, and the results are interesting (albeit limited in scale, see below, which limits signficance). Some theoretical intuition is given, although clarity in the connection of theory and empirical results could be improved.
Theory and empirical experiments are not well connected, in my opinion (I would be happy to get corrected on this during the rebuttal). Here is how I read the paper: Leveraging synthetic data requires to define how it will be mixed with real data. Also, as image statistics of synthetic data might be different from real data*, the existing augmentations applied to contrastive learning might be suboptiomal --- hence, re-tuning both aspects of the model could help to boost performance. If the autho
- This paper is well-structured. - The theoretical results and findings are interesting. - The experimental design is good. Experiments and ablation studies successfully verify theoretical results.
- The preliminary section is not well-written. - There is a concern about the generality of AdaInf. It seems that the method is hand-crafted rather than adaptive. - There is a lack of motivation for considering diffusion models in this paper. All theoretical results can be generalized to any generative model. - Some notation and experiment results are not consistent.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods
MethodsBitcoin Customer Service Number +1-833-534-1729 · Average Pooling · Color Jitter · Normalized Temperature-scaled Cross Entropy Loss · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Max Pooling · Kaiming Initialization · Diffusion · Global Average Pooling
