Pay Attention to What You Need

Yifei Gao; Shaohong Chen; Lei Wang; Ruiting Dai; Ziyun Zhang; Kerui; Ren; Jiaji Wu; Jun Cheng

arXiv:2307.13365·cs.CL·February 25, 2025

Pay Attention to What You Need

Yifei Gao, Shaohong Chen, Lei Wang, Ruiting Dai, Ziyun Zhang, Kerui, Ren, Jiaji Wu, Jun Cheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces Scaled ReAttention, a novel inference-time method that enhances large language models' long-context comprehension by manipulating attention scores, avoiding retraining or fine-tuning.

Contribution

The paper proposes SRA, a new attention manipulation technique that improves LLMs' understanding of long contexts without additional training resources.

Findings

01

SRA significantly improves LLMs' performance on downstream tasks.

02

The method enhances language understanding without retraining.

03

Experimental results validate the practical effectiveness of SRA.

Abstract

Although large language models (LLMs) have achieved significant success in natural language processing, they still struggle with long-context comprehension. Traditional approaches to mitigating this issue typically rely on fine-tuning or retraining, which is both resource-intensive and challenging to deploy in lightweight industrial settings. In this paper, we investigate the potential to accomplish this without any additional resources. Through an in-depth study of the attention mechanism in LLMs, we propose a method called Scaled ReAttention (SRA) to strengthen LLMs' ability to interpret and retrieve information by strategically manipulating their attention scores during inference. Through extensive experiments, we demonstrate that integrating SRA significantly boosts LLMs' performance on a variety of downstream tasks, highlighting its practical potential for enhancing language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yileijin/attention-transition
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsFocus