Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners
Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

TL;DR
This paper presents a theoretical analysis showing that adversarially pretrained transformers can serve as universally robust models capable of adapting to diverse tasks with minimal tuning, offering a promising direction for robust AI systems.
Contribution
It introduces the concept of universally robust foundation models trained via adversarial pretraining, capable of in-context learning on unseen tasks without additional adversarial training.
Findings
Single-layer linear transformers generalize robustly to unseen tasks.
Adversarial pretraining enables models to focus on robust features.
Training for universality is expensive but beneficial for downstream robustness.
Abstract
Adversarial training is one of the most effective defenses against adversarial attacks, but it incurs a high computational cost. In this study, we present the first theoretical analysis suggesting that adversarially pretrained transformers can serve as universally robust foundation models -- models that can adapt robustly to diverse downstream tasks with only lightweight tuning. Specifically, we demonstrate that single-layer linear transformers, after adversarial pretraining across a variety of classification tasks, can generalize robustly to unseen classification tasks through in-context learning from clean demonstrations (i.e., without requiring additional adversarial training or examples). This universal robustness stems from the model's ability to adaptively focus on robust features within given tasks. We also identify two open challenges for attaining robustness: the…
Peer Reviews
Decision·ICLR 2026 Poster
- Generally well written paper. - The core insight of the paper (theorem 3.6) -- that universally robust linear transformer (for very particular class of classification tasks) are realizable is moderately surprising and interesting imo. - Other main results of the paper (accuracy-robustness tradeoff, need for larger in-context datasets) are well presented though not that surprising. - The theoretical results are interpretable and provide some intuition about accuracy-robustness trade off. - The
Most of the weaknesses of the paper relate to the (narrow) assumptions authors make to make the theoretical results tractable. While typical of theory papers, these are nevertheless weaknesses as they limit the relevance of these results to practical contexts. - The paper only studies single-layer linear transformers. This is a major weakness of the paper; as all the results pertain to this narrow class of models, I am not certain whether there is something special about this class of models, or
1. the paper is in general well written with clear notation and explanations 2. The author provides a formal analysis that proves task-transferable robustness under a matched threat model. 3. The author provides a good experiental setup with clear sample creation scheme.
1. The proofs are for single-layer linear transformers with matched adversaries; the paper text, as described, markets this as universally robust foundation models. For it to be an universally robust model, I would expect an transferability for different perturbation class and distribution shift setup. The whole story depends on the downstream adversary matching (or being close to) the one used in pretraining. 2. Another key theoretical baseline missing is that the author did not compare the no
- The paper is clearly written and easy to follow. - The topic is promising. If robustness can indeed be efficiently achieved during pre-training and transferred to downstream tasks, it would be highly meaningful. - The paper seems to provide solid and convincing theoretical analysis. - The authors also offer empirical results that provide a certain level of support for the proposed approach.
- The paper is limited to single-layer linear transformers, and it remains unclear whether the theoretical results can generalize to more realistic multi-layer non-linear transformers, which form the basis of modern foundation models. - The empirical evaluation is also restricted to single-layer linear transformers. I suggest conducting experiments on multi-layer non-linear transformers and on more complex datasets to better demonstrate the generalizability of the conclusions (not necessarily la
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)
MethodsFocus
