Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training
Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang, Baocai Yin

TL;DR
This paper interprets catastrophic overfitting in Fast Adversarial Training as a backdoor-like trigger, providing a unified theoretical framework and proposing mitigation strategies inspired by backdoor defenses.
Contribution
It offers a novel backdoor-based interpretation of catastrophic overfitting and introduces mitigation methods guided by this perspective.
Findings
Backdoor interpretation effectively explains CO phenomena.
Mitigation strategies improve robustness against CO.
Experimental results validate the proposed framework.
Abstract
Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
