Found in the Middle: Calibrating Positional Attention Bias Improves Long   Context Utilization

Cheng-Yu Hsieh; Yung-Sung Chuang; Chun-Liang Li; Zifeng Wang; Long T.; Le; Abhishek Kumar; James Glass; Alexander Ratner; Chen-Yu Lee; Ranjay; Krishna; Tomas Pfister

arXiv:2406.16008·cs.CL·July 4, 2024

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T., Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay, Krishna, Tomas Pfister

PDF

Open Access 1 Video

TL;DR

This paper identifies a U-shaped attention bias in LLMs that favors the beginning and end of input, and introduces a calibration method to improve middle context attention, enhancing long input processing and retrieval tasks.

Contribution

The paper uncovers the intrinsic attention bias in LLMs and proposes a calibration mechanism to improve middle context attention and overall long-input performance.

Findings

01

Calibration improves attention to middle context regions.

02

Enhanced retrieval-augmented generation performance.

03

Up to 15% performance improvement over existing methods.

Abstract

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization· underline

Taxonomy

TopicsNeural and Behavioral Psychology Studies

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training