LLMSteer: Improving Long-Context LLM Inference by Steering Attention on   Reused Contexts

Zhuohan Gu; Jiayi Yao; Kuntai Du; Junchen Jiang

arXiv:2411.13009·cs.LG·November 22, 2024

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Zhuohan Gu, Jiayi Yao, Kuntai Du, Junchen Jiang

PDF

Open Access

TL;DR

LLMSteer is a novel framework that improves long-context LLM inference by steering attention on reused contexts, significantly enhancing efficiency and performance without additional fine-tuning.

Contribution

It introduces a fine-tuning-free attention steering method that reduces computational costs and improves long-context understanding in LLMs.

Findings

01

Narrowed performance gap with baselines by 65.9%.

02

Reduced runtime delay by up to 4.8x.

03

Effective across popular LLMs and datasets.

Abstract

As large language models (LLMs) show impressive performance on complex tasks, they still struggle with longer contextual understanding and high computational costs. To balance efficiency and quality, we introduce LLMSteer, a fine-tuning-free framework that enhances LLMs through query-independent attention steering. Tested on popular LLMs and datasets, LLMSteer narrows the performance gap with baselines by 65.9% and reduces the runtime delay by up to 4.8x compared to recent attention steering methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need