Catastrophic overfitting can be induced with discriminative non-robust features
Guillermo Ortiz-Jim\'enez, Pau de Jorge, Amartya Sanyal, Adel Bibi,, Puneet K. Dokania, Pascal Frossard, Gregory Rog\'ez, Philip H.S. Torr

TL;DR
This paper investigates how non-robust, seemingly innocuous features in images can induce catastrophic overfitting during adversarial training, revealing new insights into the failure mechanisms of robust neural network training.
Contribution
It demonstrates that easy-to-learn non-robust features can trigger catastrophic overfitting at smaller perturbation levels, advancing understanding of adversarial training failures.
Findings
Non-robust features induce CO at smaller epsilon values.
Easy features create shortcuts leading to CO.
Insights into the dynamics of adversarial training failures.
Abstract
Adversarial training (AT) is the de facto method for building robust neural networks, but it can be computationally expensive. To mitigate this, fast single-step attacks can be used, but this may lead to catastrophic overfitting (CO). This phenomenon appears when networks gain non-trivial robustness during the first stages of AT, but then reach a breaking point where they become vulnerable in just a few iterations. The mechanisms that lead to this failure mode are still poorly understood. In this work, we study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images. In particular, we show that CO can be induced at much smaller values than it was observed before just by injecting images with seemingly innocuous features. These features aid non-robust classification but are not enough to achieve robustness on their own.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Image Processing Techniques and Applications
