Defense That Attacks: How Robust Models Become Better Attackers
Mohamed Awad, Mahmoud Akrm, Walid Gomaa

TL;DR
This paper investigates how adversarial training affects the transferability of attacks, revealing that robust models can produce more transferable adversarial examples, which poses new ecosystem risks.
Contribution
It uncovers the paradox that adversarially trained models generate more transferable attacks, challenging assumptions about robustness and transferability.
Findings
Adversarially trained models produce more transferable adversarial examples.
Transferability increases even in diverse model architectures including CNNs and ViTs.
Robustness evaluations should consider attack transferability as a key factor.
Abstract
Deep learning has achieved great success in computer vision, but remains vulnerable to adversarial attacks. Adversarial training is the leading defense designed to improve model robustness. However, its effect on the transferability of attacks is underexplored. In this work, we ask whether adversarial training unintentionally increases the transferability of adversarial examples. To answer this, we trained a diverse zoo of 36 models, including CNNs and ViTs, and conducted comprehensive transferability experiments. Our results reveal a clear paradox: adversarially trained (AT) models produce perturbations that transfer more effectively than those from standard models, which introduce a new ecosystem risk. To enable reproducibility and further study, we release all models, code, and experimental scripts. Furthermore, we argue that robustness evaluations should assess not only the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Generative Adversarial Networks and Image Synthesis
