Lost in the Middle: How Language Models Use Long Contexts

Nelson F. Liu; Kevin Lin; John Hewitt; Ashwin Paranjape and; Michele Bevilacqua; Fabio Petroni; Percy Liang

arXiv:2307.03172·cs.CL·November 22, 2023·58 cites

Lost in the Middle: How Language Models Use Long Contexts

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape and, Michele Bevilacqua, Fabio Petroni, Percy Liang

PDF

Open Access 5 Repos 10 Models 1 Datasets

TL;DR

This paper investigates how current language models utilize long input contexts, revealing that their performance drops when relevant information is located in the middle of long inputs, highlighting limitations in their context usage.

Contribution

The study provides a detailed analysis of language models' performance with long contexts, identifying positional biases and proposing new evaluation protocols for future models.

Findings

01

Performance peaks at beginning or end of context

02

Significant degradation when relevant info is in the middle

03

Current models do not robustly utilize long contexts

Abstract

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

bzantium/LITM
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks