Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data
Zhuoxun He, Lingxi Xie, Xin Chen, Ya Zhang, Yanfeng Wang, Qi Tian

TL;DR
This paper analyzes the impact of data augmentation on model generalization, revealing a trade-off between reduced generalization error and increased empirical risk, and proposes a simple refinement method to improve accuracy.
Contribution
It offers an analytical perspective on data augmentation as regularization, quantifies the distribution gap, and introduces a refinement approach for better performance.
Findings
Data augmentation reduces generalization error but slightly increases empirical risk.
Using less-augmented data for refinement improves model accuracy.
The approach improves performance on image classification and object detection benchmarks.
Abstract
Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed improved accuracy, yet we notice that these methods augment data have also caused a considerable gap between clean and augmented data. In this paper, we revisit this problem from an analytical perspective, for which we estimate the upper-bound of expected risk using two terms, namely, empirical risk and generalization error, respectively. We develop an understanding of data augmentation as regularization, which highlights the major features. As a result, data augmentation significantly reduces the generalization error, but meanwhile leads to a slightly higher empirical risk. On the assumption that data augmentation helps models converge to a better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
