Effective Data Augmentation With Diffusion Models
Brandon Trabucco, Kyle Doherty, Max Gurinas, Ruslan Salakhutdinov

TL;DR
This paper introduces a novel data augmentation method using pre-trained diffusion models to modify high-level semantic attributes of images, improving diversity and accuracy in few-shot and real-world classification tasks.
Contribution
It proposes a new image-to-image augmentation technique leveraging diffusion models to alter semantic content, enhancing data diversity beyond simple transformations.
Findings
Improved classification accuracy in few-shot learning scenarios.
Enhanced diversity of augmented images along semantic axes.
Effective generalization to new visual concepts with minimal labels.
Abstract
Data augmentation is one of the most prevalent tools in deep learning, underpinning many recent advances, including those from classification, generative models, and representation learning. The standard approach to data augmentation combines simple transformations like rotations and flips to generate new images from existing ones. However, these new images lack diversity along key semantic axes present in the data. Current augmentations cannot alter the high-level semantic attributes, such as animal species present in a scene, to enhance the diversity of data. We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples. We evaluate our…
Peer Reviews
Decision·ICLR 2024 poster
The work on DA-Fusion offers several strengths and introduces significant novelty in the context of data augmentation: * Novel Data Augmentation Technique: DA-Fusion introduces a unique approach to data augmentation by leveraging large pretrained generative models. It goes beyond traditional data augmentation methods that mainly involve geometric transformations. This novelty lies in utilizing generative models to create diverse and semantically meaningful variations of real images. * Semantic P
Here are some of the weaknesses of this work: * Complexity and Computational Cost: The proposed method involves fine-tuning pseudo-prompts for each concept, which can be computationally expensive and time-consuming. This approach might not be as practical as traditional data augmentation techniques that are computationally efficient and easy to implement. * Lack of Control Over Augmentations: While DA-Fusion introduces the concept of modifying images while respecting their semantic attributes, i
- The paper was well written and easy to follow. - Performing textual inversion for novel concepts seems like a promising idea.
1. Generalizability of the method. - It seems like the learned tokens are helpful for learning dataset specific biases, as it can add additional information on the general “style” of the dataset and of its classes. However, it seems to make certain assumptions about the datasets, e.g., that the images are object centric - only the target class is present, they have standard poses/viewpoints. It is not clear if there is substantial variations within the images of that class, e.g. in iwildcam,
1. The proposed method that combines several existing techniques for data augmentation is interesting. 2. The paper provide insightful analysis on several design choices, including the time step at which the real image is inserted during generation, strategies to prevent leakage of internet data, mixing ratio of real and synthetic images and the number of augmentations generated for each image. 3. The paper will release code and an aerial imagery dataset of leafy spurge, which will facilitate fu
1. The proposed technique is only applicable for classification tasks. It is not clear how it can be applied for object detection and segmentation tasks. My thinking is that one of the major drawbacks of such generative model-based augmentation method vs. traditional method may be that it can not simultaneously generate the segmentation mask and bounding box annotation for the augmented images.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsDiffusion
