Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao; Lihe Zhang; Tianhang Zheng; Bo Wang; Baocai Yin

arXiv:2604.24350·cs.LG·April 28, 2026

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang, Baocai Yin

PDF

TL;DR

This paper interprets catastrophic overfitting in Fast Adversarial Training as a backdoor-like trigger, providing a unified theoretical framework and proposing mitigation strategies inspired by backdoor defenses.

Contribution

It offers a novel backdoor-based interpretation of catastrophic overfitting and introduces mitigation methods guided by this perspective.

Findings

01

Backdoor interpretation effectively explains CO phenomena.

02

Mitigation strategies improve robustness against CO.

03

Experimental results validate the proposed framework.

Abstract

Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.