Defense That Attacks: How Robust Models Become Better Attackers

Mohamed Awad; Mahmoud Akrm; Walid Gomaa

arXiv:2512.02830·cs.CV·December 15, 2025

Defense That Attacks: How Robust Models Become Better Attackers

Mohamed Awad, Mahmoud Akrm, Walid Gomaa

PDF

Open Access

TL;DR

This paper investigates how adversarial training affects the transferability of attacks, revealing that robust models can produce more transferable adversarial examples, which poses new ecosystem risks.

Contribution

It uncovers the paradox that adversarially trained models generate more transferable attacks, challenging assumptions about robustness and transferability.

Findings

01

Adversarially trained models produce more transferable adversarial examples.

02

Transferability increases even in diverse model architectures including CNNs and ViTs.

03

Robustness evaluations should consider attack transferability as a key factor.

Abstract

Deep learning has achieved great success in computer vision, but remains vulnerable to adversarial attacks. Adversarial training is the leading defense designed to improve model robustness. However, its effect on the transferability of attacks is underexplored. In this work, we ask whether adversarial training unintentionally increases the transferability of adversarial examples. To answer this, we trained a diverse zoo of 36 models, including CNNs and ViTs, and conducted comprehensive transferability experiments. Our results reveal a clear paradox: adversarially trained (AT) models produce perturbations that transfer more effectively than those from standard models, which introduce a new ecosystem risk. To enable reproducibility and further study, we release all models, code, and experimental scripts. Furthermore, we argue that robustness evaluations should assess not only the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Generative Adversarial Networks and Image Synthesis