Evaluating Temporal Consistency in Multi-Turn Language Models

Yash Kumar Atri; Steven L. Johnson; Tom Hartvigsen

arXiv:2604.23051·cs.CL·April 28, 2026

Evaluating Temporal Consistency in Multi-Turn Language Models

Yash Kumar Atri, Steven L. Johnson, Tom Hartvigsen

PDF

1 Repo

TL;DR

This paper introduces ChronoScope, a benchmark for testing how well language models maintain and update temporal context over multiple dialogue turns, revealing significant stability issues.

Contribution

The paper presents a large-scale diagnostic benchmark for evaluating temporal scope stability in multi-turn language models, highlighting prevalent challenges in temporal reasoning.

Findings

01

Models often drift toward present-day assumptions despite correct knowledge.

02

Temporal stability issues increase with longer interactions.

03

Failures persist even with oracle context, indicating a gap in temporal reasoning.

Abstract

Language models are increasingly deployed in interactive settings where users reason about facts over time rather than in isolation. In such scenarios, correct behavior requires models to maintain and update implicit temporal assumptions established earlier in a conversation. We study this challenge through the lens of temporal scope stability: the ability to preserve, override, or transfer time-scoped factual context across dialogue turns. We introduce ChronoScope, a large-scale diagnostic benchmark designed to isolate temporal scope behavior in controlled multi-turn interactions, comprising over one million deterministically generated question chains grounded in Wikidata. ChronoScope evaluates whether models can correctly retain inferred temporal scope when follow-up questions omit explicit time references, spanning implicit carryover, explicit scope switching, cross-entity transfer,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yashkumaratri/ChronoScope
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.