Evaluating the Sensitivity of LLMs to Prior Context

Robert Hankache; Kingsley Nketia Acheampong; Liang Song; Marek Brynda; Raad Khraishi; Greig A. Cowan

arXiv:2506.00069·cs.CL·June 3, 2025

Evaluating the Sensitivity of LLMs to Prior Context

Robert Hankache, Kingsley Nketia Acheampong, Liang Song, Marek Brynda, Raad Khraishi, Greig A. Cowan

PDF

Open Access

TL;DR

This paper investigates how extended prior context influences large language models' performance, revealing significant sensitivity and proposing strategies to mitigate context-related performance drops in multi-turn interactions.

Contribution

It introduces new benchmarks for evaluating LLM sensitivity to context variations and systematically analyzes multiple models' performance impacts.

Findings

01

Performance drops up to 73% in multi-turn settings

02

GPT-4o shows up to 32% accuracy decrease

03

Strategic placement of task description improves accuracy by up to 3.5 times

Abstract

As large language models (LLMs) are increasingly deployed in multi-turn dialogue and other sustained interactive scenarios, it is essential to understand how extended context affects their performance. Popular benchmarks, focusing primarily on single-turn question answering (QA) tasks, fail to capture the effects of multi-turn exchanges. To address this gap, we introduce a novel set of benchmarks that systematically vary the volume and nature of prior context. We evaluate multiple conventional LLMs, including GPT, Claude, and Gemini, across these benchmarks to measure their sensitivity to contextual variations. Our findings reveal that LLM performance on multiple-choice questions can degrade dramatically in multi-turn interactions, with performance drops as large as 73% for certain models. Even highly capable models such as GPT-4o exhibit up to a 32% decrease in accuracy. Notably, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques