Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to   LLM Attention Recalibration

James Begin; Namit Agrawal; Eshan Singh; Yicheng Fu; Sean O'Brien,; Vasu Sharma; Kevin Zhu

arXiv:2502.20405·cs.CL·March 3, 2025

Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration

James Begin, Namit Agrawal, Eshan Singh, Yicheng Fu, Sean O'Brien,, Vasu Sharma, Kevin Zhu

PDF

1 Video

TL;DR

Pause-tuning is a lightweight method that improves long-context comprehension in large language models by redistributing attention through fine-tuning with artificially inserted pause tokens, significantly enhancing performance on lengthy inputs.

Contribution

This paper introduces pause-tuning, a novel fine-tuning technique that addresses the Lost-in-the-Middle problem by redistributing attention in LLMs for better long-context understanding.

Findings

01

Significant performance improvements on the Needle-in-a-Haystack benchmark.

02

LLaMA 3.2 3B Instruct model improves by 10.61%.

03

LLaMA 3.1 8B Instruct model improves by 3.57%.

Abstract

LLMs have demonstrated remarkable proficiency in understanding tasks but continue to struggle with long-context comprehension, particularly with content located in the middle of extensive inputs. This limitation, known as the Lost-in-the-Middle (LITM) problem, hinders models from fully processing and utilizing information across lengthy contexts. To address this issue, we introduce pause-tuning, a technique that redistributes attention to enhance comprehension of long-context inputs. Our approach involves fine-tuning language models on datasets with artificially inserted pause tokens, which serve to segment the input into smaller, more manageable parts. We evaluate pause-tuning against alternative approaches using the Needle-in-a-Haystack benchmark, where models must retrieve information embedded within contexts of up to 128K tokens. Experimental results demonstrate significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration· underline

Taxonomy

MethodsSoftmax · Attention Is All You Need · LLaMA