Catastrophic overfitting can be induced with discriminative non-robust   features

Guillermo Ortiz-Jim\'enez; Pau de Jorge; Amartya Sanyal; Adel Bibi,; Puneet K. Dokania; Pascal Frossard; Gregory Rog\'ez; Philip H.S. Torr

arXiv:2206.08242·cs.LG·August 16, 2023·1 cites

Catastrophic overfitting can be induced with discriminative non-robust features

Guillermo Ortiz-Jim\'enez, Pau de Jorge, Amartya Sanyal, Adel Bibi,, Puneet K. Dokania, Pascal Frossard, Gregory Rog\'ez, Philip H.S. Torr

PDF

Open Access 1 Repo

TL;DR

This paper investigates how non-robust, seemingly innocuous features in images can induce catastrophic overfitting during adversarial training, revealing new insights into the failure mechanisms of robust neural network training.

Contribution

It demonstrates that easy-to-learn non-robust features can trigger catastrophic overfitting at smaller perturbation levels, advancing understanding of adversarial training failures.

Findings

01

Non-robust features induce CO at smaller epsilon values.

02

Easy features create shortcuts leading to CO.

03

Insights into the dynamics of adversarial training failures.

Abstract

Adversarial training (AT) is the de facto method for building robust neural networks, but it can be computationally expensive. To mitigate this, fast single-step attacks can be used, but this may lead to catastrophic overfitting (CO). This phenomenon appears when networks gain non-trivial robustness during the first stages of AT, but then reach a breaking point where they become vulnerable in just a few iterations. The mechanisms that lead to this failure mode are still poorly understood. In this work, we study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images. In particular, we show that CO can be induced at much smaller $ϵ$ values than it was observed before just by injecting images with seemingly innocuous features. These features aid non-robust classification but are not enough to achieve robustness on their own.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gortizji/co_features
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Image Processing Techniques and Applications