A Preliminary Study on Data Augmentation of Deep Learning for Image Classification
Benlin Hu, Cheng Lei, Dong Wang, Shu Zhang, Zhenyu Chen

TL;DR
This study investigates how different data augmentation strategies affect deep learning image classification accuracy, providing practical guidelines for effective augmentation based on variables like method, rate, and dataset size.
Contribution
It offers preliminary insights into the impact of augmentation methods, rates, and dataset sizes on model accuracy, guiding future data augmentation practices.
Findings
Geometric transformations outperform lighting/color adjustments.
Augmentation rates of 2-3 times are optimal.
Smaller datasets benefit more from augmentation.
Abstract
Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance. However, data obtaining andlabeling is expensive in practice. Data augmentation is one of themethods to alleviate this problem. In this paper, we conduct apreliminary study on how three variables (augmentation method,augmentation rate and size of basic dataset per label) can affectthe accuracy of deep learning for image classification. The studyprovides some guidelines: (1) it is better to use transformationsthat alter the geometry of the images rather than those justlighting and color. (2) 2-3 times augmentation rate is good enoughfor training. (3) the smaller amount of data, the more obviouscontributions could have.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
