Benign Overfitting in Adversarial Training for Vision Transformers
Jiaming Zhang, Meng Ding, Shaopeng Fu, Jingfeng Zhang, Di Wang

TL;DR
This paper provides the first theoretical analysis of adversarial training in Vision Transformers, showing conditions under which it achieves robust generalization and benign overfitting, supported by experiments.
Contribution
It introduces a theoretical framework for adversarial training in simplified ViT architectures and demonstrates benign overfitting phenomena.
Findings
Adversarial training enables near-zero robust training loss under certain conditions.
Robust generalization is achieved even with overfitting, demonstrating benign overfitting.
Experimental results on synthetic and real datasets validate the theory.
Abstract
Despite the remarkable success of Vision Transformers (ViTs) across a wide range of vision tasks, recent studies have revealed that they remain vulnerable to adversarial examples, much like Convolutional Neural Networks (CNNs). A common empirical defense strategy is adversarial training, yet the theoretical underpinnings of its robustness in ViTs remain largely unexplored. In this work, we present the first theoretical analysis of adversarial training under simplified ViT architectures. We show that, when trained under a signal-to-noise ratio that satisfies a certain condition and within a moderate perturbation budget, adversarial training enables ViTs to achieve nearly zero robust training loss and robust generalization error under certain regimes. Remarkably, this leads to strong generalization even in the presence of overfitting, a phenomenon known as \emph{benign overfitting},…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
