Decoupled Data Augmentation for Improving Image Classification
Ruoxin Chen, Zhe Wang, Ke-Yue Zhang, Shuang Wu, Jiamu Sun, Shouli, Wang, Taiping Yao, Shouhong Ding

TL;DR
This paper introduces Decoupled Data Augmentation (De-DA), a novel approach that separates images into class-dependent and class-independent parts to improve the balance of fidelity and diversity in image augmentation for classification tasks.
Contribution
The paper proposes a new method that decouples image parts to enhance data augmentation, addressing the fidelity-diversity trade-off more effectively than existing techniques.
Findings
De-DA improves image classification accuracy across multiple datasets.
The method maintains semantic fidelity while increasing diversity.
Empirical results show significant gains over traditional augmentation methods.
Abstract
Recent advancements in image mixing and generative data augmentation have shown promise in enhancing image classification. However, these techniques face the challenge of balancing semantic fidelity with diversity. Specifically, image mixing involves interpolating two images to create a new one, but this pixel-level interpolation can compromise fidelity. Generative augmentation uses text-to-image generative models to synthesize or modify images, often limiting diversity to avoid generating out-of-distribution data that potentially affects accuracy. We propose that this fidelity-diversity dilemma partially stems from the whole-image paradigm of existing methods. Since an image comprises the class-dependent part (CDP) and the class-independent part (CIP), where each part has fundamentally different impacts on the image's fidelity, treating different parts uniformly can therefore be…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
* The method of dividing images into CDP and CIP regions sounds reasonable and the experiment results show its effectiveness. * The paper demonstrates the effectiveness of the method on different networks and datasets.
* The paper only conducted experiments on datasets with relatively limited data. It did not illustrate the effectiveness on larger and more general datasets. Can this method work on larger datasets like ImageNet, similar to how mixup or cutmix do? * In the experiments comparing with RandAugment, only the results of “DE-DA” and “DE-DA + RandAugment” are provided. I believe that further results for “RandAugment only” should be included. Because it would be better to demonstrate that the “DE-DA +
1. This paper introduces a simple yet effective method that balances fidelity and diversity in synthetic data. The core idea is to decouple CIPs and CDPs using an off-the-shelf segmentor, augmenting them separately and then combining them to create new samples. This approach presents a methodological innovation compared to prior generative data augmentation techniques. 2. De-DA utilizes layer-based composition to generate synthetic samples, making the synthesis process highly efficient. This is
1.Although the Online Randomized Combination method is efficient, it may lead to semantically unnatural compositions. For instance, birds in the synthetic samples do not always appear naturally perched on branches (see the last row of birds in Figure 6, where proper positioning on branches is rare). While semantic naturalness may not always be critical for classification, this could reduce the generalizability of synthetic data. 2.The authors should include more visualizations of De-DA results,
- The paper is well-written and is easy to understand - The experimental results are promising
Limited Novelty. The approach of using background changes to create augmented data has been studied before. For example, InSP[1] swaps the saliency part of two images from the same class and is tested on CUB, Stanford Car, and FGVC-Aircraft datasets. Copy-paste augmentation [2] is a low-cost augmentation method that copies and pastes a random object into another image, for instance segmentation. Applying textual inversion and SDEdit to transform objects was suggested in DA-Fusion. The SDEdit and
1. The motivation is sound. The paper proposes separating foreground and background before targeted augmentation, aiming to maintain fidelity while increasing diversity. This approach is reasonable and effective, contrasting with previous methods that targeted the entire image. 2. The paper effectively leverages existing technologies like SAM and LayerDiffuse to propose a new data augmentation framework, yielding valid experimental results. 3. Extensive experimental results and open-source code
**Major:** 1. Limited technical contribution. The paper utilizes existing techniques such as SAM, LayerDiffuse, and text inversion. While these technologies are well-utilized to produce seemingly credible results, there is a lack of novel technical contributions and design. Additionally, performance improvements gained by using SAM are unsurprising and come with increased computational costs. Would the method still work without SAM? Are there alternatives to SAM? 2. The writing needs further po
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification
