Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens
Usman Anwar, Johannes Von Oswald, Louis Kirsch, David Krueger, Spencer Frei

TL;DR
This paper investigates the adversarial vulnerabilities of in-context learning in transformers for linear models, demonstrating that such models are susceptible to hijacking attacks and that adversarial training can improve robustness, revealing differences from traditional algorithms.
Contribution
It provides the first analysis of adversarial robustness in in-context learning of linear models by transformers and compares vulnerabilities across models and algorithms.
Findings
Transformers are vulnerable to hijacking attacks.
Adversarial training improves robustness significantly.
Attacks transfer poorly between different transformer models and traditional algorithms.
Abstract
In this work, we make two contributions towards understanding of in-context learning of linear models by transformers. First, we investigate the adversarial robustness of in-context learning in transformers to hijacking attacks -- a type of adversarial attacks in which the adversary's goal is to manipulate the prompt to force the transformer to generate a specific output. We show that both linear transformers and transformers with GPT-2 architectures are vulnerable to such hijacking attacks. However, adversarial robustness to such attacks can be significantly improved through adversarial training -- done either at the pretraining or finetuning stage -- and can generalize to stronger attack models. Our second main contribution is a comparative analysis of adversarial vulnerabilities across transformer models and other algorithms for learning linear models. This reveals two novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization · Adam · Attention Dropout · Linear Layer · Linear Regression
