CALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoring
Li Lixing

TL;DR
CALYREX introduces a cross-attention mechanism in transformers to better anchor system prompts, improving instruction-following and safety in large language models, especially at larger scales.
Contribution
It proposes a novel cross-attention architecture that isolates system prompts, with empirical evidence showing improved instruction adherence and safety over standard models.
Findings
CALYREX improves instruction-following accuracy by 7.4% on IFEval.
It reduces multi-turn jailbreaking attack success rate by 13%.
Optimal prompt placement is at the final eighth of layers, confirmed by activation analysis.
Abstract
Modern large language models (LLMs) rely on system prompts to establish behavioral constraints and safety rules. Standard causal self-attention treats privileged instructions and untrusted user content with equal structural priority -- a mismatch that leaves models vulnerable to prompt injection and instruction erosion over extended contexts. We propose CALYREX (Cross-Attention LaYeR EXtended transformers), which utilizes cross-attention between input and system prompt to structurally isolate and anchor the rule. A placement ablation on a 1.5B backbone identifies insertion at the final eighth of layers as optimal, confirmed by mechanistic activation analysis showing behavioral constraints are naturally concentrated there. At 8B scale, controlling for training data, backbone, and parameter budget, CALYREX yields on instruction-following (IFEval) and on multi-turn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
