Understanding and Improving Fast Adversarial Training
Maksym Andriushchenko, Nicolas Flammarion

TL;DR
This paper investigates why fast adversarial training methods fail due to catastrophic overfitting, and introduces GradAlign, a regularization technique that improves the robustness and effectiveness of FGSM-based adversarial training.
Contribution
The paper demonstrates that randomness does not prevent overfitting, identifies the causes in simple networks, and proposes GradAlign to enhance fast adversarial training.
Findings
Randomness does not prevent catastrophic overfitting.
Single filters can cause overfitting in simple networks.
GradAlign improves FGSM training for larger perturbations.
Abstract
A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. (2020) showed that -adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called "catastrophic overfitting", when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Anomaly Detection Techniques and Applications
