Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization
Runqi Lin, Chaojian Yu, Tongliang Liu

TL;DR
This paper identifies abnormal adversarial examples as a key factor in catastrophic overfitting during single-step adversarial training and proposes a regularization method to prevent their formation, thereby enhancing robustness.
Contribution
The paper introduces AAER, a novel regularization technique that explicitly controls abnormal adversarial examples to prevent catastrophic overfitting in SSAT.
Findings
AAER effectively eliminates catastrophic overfitting.
The method improves adversarial robustness with minimal computational cost.
Experiments validate the correlation between AAEs and classifier distortion.
Abstract
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
