Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
Dang Nguyen, Jiping Li, Jinghao Zheng, Baharan Mirzasoleiman

TL;DR
TADA is a targeted data augmentation framework using diffusion models that selectively enhances under-learned examples, leading to improved generalization across various architectures and datasets with reduced computational cost.
Contribution
The paper introduces TADA, a novel targeted augmentation method that selectively augments a subset of data, outperforming full dataset augmentation and reducing computational overhead.
Findings
Augmenting only 30-40% of data improves accuracy by up to 2.8%.
TADA outperforms full augmentation and state-of-the-art optimizers.
Effective on multiple architectures and tasks, including object detection.
Abstract
Synthetically augmenting training datasets with diffusion models has become an effective strategy for improving the generalization of image classifiers. However, existing approaches typically increase dataset size by 10-30x and struggle to ensure generation diversity, leading to substantial computational overhead. In this work, we introduce TADA (TArgeted Diffusion Augmentation), a principled framework that selectively augments examples that are not learned early in training using faithful synthetic images that preserve semantic features while varying noise. We show that augmenting only this targeted subset consistently outperforms augmenting the entire dataset. Through theoretical analysis on a two-layer CNN, we prove that TADA improves generalization by promoting homogeneity in feature learning speed without amplifying noise. Extensive experiments demonstrate that by augmenting only…
Peer Reviews
Decision·ICLR 2026 Poster
The dominant strength is that the current data augmentation paper only focuses on how to generate data with high fidelity and diversity for a more robust decision boundary. However, a very small paper focuses on how to balance the real set and the synthetic set during the training process. This paper fills the blank for current generative-based data augmentation research.
This method is general, but the evaluations are limited. 1/ The evaluated backbones are too weak, and whether better-pretrained backbones can overlay the benefit of your method. 2/ Since this method is a plug-and-play module, why not evaluate it based on more state-of-the-art methods like [1,2,3,4]? Meanwhile, you should at least discuss them in the related work. 3/ Lack of evaluations on fine-grained datasets. 4/ This method seems like can be applied not only for image classification datase
- The central idea of targeting slow-learning samples for augmentation is novel and intuitive. The rationale that focusing augmentation efforts on more challenging examples seems a logical approach to improving model robustness and generalization. - The paper provides extensive empirical validation across three different datasets, showing credibility to the proposed method's effectiveness. The observation regarding the characteristics of slow-learned samples is particularly interesting and furth
- The theoretical analysis relies on a simplified two-layer CNN assumption. This raises questions about the direct applicability and relevance of the derived theorems to the deeper, more complex architectures commonly used in practice. The paper would be strengthened by a discussion bridging this theoretical gap. - I have concerns regarding the significant computational overhead of the proposed method. Utilizing a diffusion model for data generation, even for a subset of the data, is inherently
**Efficiency**: Augmenting only 30–40% of data outperforms full-dataset augmentation, offering a practical, resource-aware solution. **Empirical Support**: Ablation studies on augmentation factors and initialization provide useful insights. **Compatibility**: Works well with existing methods (e.g., SAM), boosting performance further.
**Missing Prior**: The method overlaps with "Boomerang" [1], which uses similar noise-add-and-denoise techniques for data augmentation for classification, but it’s not cited or compared. Notably, they use all of the dataset for synthetic data generation, and they see gains in accuracy, which contradict experiments in this paper. **Theory-Practice Gap**: The claim of mimicking SAM’s feature learning (e.g., sections 4.1–4.2 suggest SAM-like noise suppression and uniform learning) doesn’t fully a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
