Instruction Following by Principled Boosting Attention of Large Language Models
Vitoria Guardieiro, Avishree Khare, Adam Stein, Eric Wong

TL;DR
This paper introduces a theory and method for improving instruction following in large language models by attention boosting, enhancing reliability and safety without retraining, and demonstrates its effectiveness across multiple tasks.
Contribution
The paper formalizes attention steering as rule-based competition and proposes InstABoost, a simple additive bias method that improves instruction adherence while maintaining task relevance.
Findings
InstABoost outperforms existing methods in instruction following.
It balances instruction adherence and task relevance effectively.
The method avoids issues like fluency collapse and over-focus seen in prior approaches.
Abstract
Large language models' behavior is often shaped by instructions such as system prompts, refusal boundaries, privacy constraints, and tool-use rules that must hold at inference time. Yet in practice these constraints can be violated under long contexts or when user-provided context conflicts with them, creating reliability and safety risks. This motivates inference-time interventions that strengthen instruction influence without retraining. One such intervention is attention steering, which biases attention toward instruction tokens. In this work, we present a unifying theory for attention steering methods by formalizing instruction following as rule-based competition between instruction rules and context-derived rules, with attention mediating which rules dominate. We prove that boosting attention to instruction tokens tilts this competition, making it harder for context to override…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
MethodsInstaBoost
