LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
Zhuohan Gu, Jiayi Yao, Kuntai Du, Junchen Jiang

TL;DR
LLMSteer is a novel framework that improves long-context LLM inference by steering attention on reused contexts, significantly enhancing efficiency and performance without additional fine-tuning.
Contribution
It introduces a fine-tuning-free attention steering method that reduces computational costs and improves long-context understanding in LLMs.
Findings
Narrowed performance gap with baselines by 65.9%.
Reduced runtime delay by up to 4.8x.
Effective across popular LLMs and datasets.
Abstract
As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8x compared to recent attention steering methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
