On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li, Chenyang Zhang, Xingwu Chen, Yuan Cao, Difan Zou

TL;DR
This paper investigates the robustness of transformer models against context hijacking in linear classification, providing theoretical analysis and empirical evidence that deeper transformers are more resistant to interference from factually correct but misleading context tokens.
Contribution
It offers a theoretical framework explaining how transformer depth influences robustness to context hijacking, supported by numerical experiments and analysis of linear transformers.
Findings
Deeper transformers exhibit higher robustness to context hijacking.
Increased depth enables more fine-grained optimization, reducing interference.
Theoretical analysis aligns with empirical results on robustness improvements.
Abstract
Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a significant robustness issue. To understand this phenomenon theoretically, we explore an in-context linear classification problem based on recent advances in linear transformers. In our setup, context tokens are designed as factually correct query-answer pairs, where the queries are similar to the final query but have opposite labels. Then, we develop a general theoretical analysis on the robustness of the linear transformers, which is formulated as a function of the model depth, training context lengths, and number of hijacking context tokens. A key finding is that a well-trained deeper transformer can achieve higher robustness, which aligns with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
