Generating Less Certain Adversarial Examples Improves Robust Generalization
Minxing Zhang, Michael Backes, Xiao Zhang

TL;DR
This paper shows that generating less certain adversarial examples during training enhances a model's robustness and reduces overfitting, by formalizing adversarial certainty and developing a method to produce such examples.
Contribution
It introduces a formal measure of adversarial certainty, links it to robust generalization, and proposes a method to generate less certain adversarial inputs for improved robustness.
Findings
Models trained with less certain adversarial examples show better robustness.
The proposed method mitigates robust overfitting in image classification.
Experimental results confirm improved robustness across benchmarks.
Abstract
This paper revisits the robust overfitting phenomenon of adversarial training. Observing that models with better robust generalization performance are less certain in predicting adversarially generated training inputs, we argue that overconfidence in predicting adversarial examples is a potential cause. Therefore, we hypothesize that generating less certain adversarial examples improves robust generalization, and propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples. Our theoretical analysis of synthetic distributions characterizes the connection between adversarial certainty and robust generalization. Accordingly, built upon the notion of adversarial certainty, we develop a general method to search for models that can generate training-time adversarial inputs with reduced certainty, while maintaining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications
