A Flat Minima Perspective on Understanding Augmentations and Model Robustness
Weebum Yoo, Sung Whan Yoon

TL;DR
This paper provides a theoretical framework linking label-preserving data augmentations to model robustness against distribution shifts, supported by empirical simulations on CIFAR and ImageNet datasets.
Contribution
It introduces a unified theory based on flat minima to explain how diverse augmentations enhance robustness across various distribution shifts.
Findings
Theoretical condition for augmentations to improve robustness
Strong correlation between flat minima and robustness
Empirical validation on CIFAR and ImageNet benchmarks
Abstract
Model robustness indicates a model's capability to generalize well on unforeseen distributional shifts, including data corruptions and adversarial attacks. Data augmentation is one of the most prevalent and effective ways to enhance robustness. Despite the great success of the diverse augmentations in different fields, a unified theoretical understanding of their efficacy in improving model robustness is lacking. We theoretically reveal a general condition for label-preserving augmentations to bring robustness to diverse distribution shifts through the lens of flat minima and generalization bound, which de facto turns out to be strongly correlated with robustness against different distribution shifts in practice. Unlike most earlier works, our theoretical framework accommodates all the label-preserving augmentations and is not limited to particular distribution shifts. We substantiate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel Reduction and Neural Networks
