Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs
Aditya Sinha, Harald Steck, Vito Ostuni, Matteo Rinaldi

TL;DR
This paper evaluates how well large language models detect topic shifts and select relevant context in multi-turn conversations, highlighting current limitations and challenges in maintaining context accuracy.
Contribution
It introduces synthetic benchmarks for testing LLMs' multi-turn understanding and provides a comprehensive analysis of their performance on context switching tasks.
Findings
Reasoning and strongly instructed LLMs perform better in detecting pivots.
Open-weight LLMs often carry stale context despite explicit cues.
All models exhibit position bias affecting context understanding.
Abstract
Users interacting with Large Language Models (LLMs) in a multi-turn conversation routinely refine their requests or pivot to new topics. LLMs, however, often miss these topic shifts and carry over irrelevant context from previous turns, leading to inaccurate responses. In this paper, we stress-test the multi-turn understanding of LLMs and study the following two sub-tasks: (1) detecting whether the user pivots or refines in the current turn, and (2) shortlisting relevant context from previous turns. To this end, we construct synthetic benchmarks based on real-world datasets from varied domains, as to simulate context shifts of different levels of difficulty. We then evaluate the zero-shot performance of ten LLMs (open-weight, closed-source and reasoning), and demonstrate that only some reasoning and strongly instructed LLMs are accurate in detecting pivots; open-weight LLMs struggle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
