A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels
Emeson Santana, Gustavo Carneiro, Filipe R. Cordeiro

TL;DR
This paper investigates how different data augmentation techniques affect the robustness of convolutional neural networks trained on noisy labels, demonstrating significant improvements in accuracy across multiple datasets.
Contribution
It provides a comprehensive evaluation of data augmentation strategies' impact on model robustness against label noise, highlighting their potential as a design choice.
Findings
Data augmentation can improve robustness up to 177.84% relative accuracy increase.
Proper augmentation selection enhances training with noisy labels.
Combining augmentation with DivideMix yields up to 6% absolute accuracy gain.
Abstract
Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks. Although several works have focused on the training strategies to address this problem, there are few studies that evaluate the impact of data augmentation as a design choice for training deep neural networks. In this work, we analyse the model robustness when using different data augmentations and their improvement on the training with the presence of noisy labels. We evaluate state-of-the-art and classical data augmentation strategies with different levels of synthetic noise for the datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We evaluate the methods using the accuracy metric. Results show that the appropriate selection of data augmentation can drastically improve the model robustness to label noise, increasing up to 177.84% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Infrastructure Maintenance and Monitoring · Anomaly Detection Techniques and Applications
MethodsTest
