Understanding the Detrimental Class-level Effects of Data Augmentation
Polina Kirichenko, Mark Ibrahim, Randall Balestriero, Diane, Bouchacourt, Ramakrishna Vedantam, Hamed Firooz, Andrew Gordon Wilson

TL;DR
This paper investigates how data augmentation can negatively impact class-specific accuracy in image classification, revealing that certain class types are more vulnerable and proposing targeted strategies to mitigate these effects.
Contribution
It introduces a framework to understand class-level effects of data augmentation and demonstrates how class-conditional augmentation can improve accuracy for affected classes.
Findings
Most affected classes are ambiguous, co-occurring, or fine-grained.
Data augmentation biases models towards specific classes.
Class-conditional augmentation improves accuracy on negatively impacted classes.
Abstract
Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Machine Learning and Data Classification
