Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time

David Herel; Vojtech Bartek; Jiri Jirak; Tomas Mikolov

arXiv:2409.13338·cs.CL·May 16, 2025

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time

David Herel, Vojtech Bartek, Jiri Jirak, Tomas Mikolov

PDF

Open Access

TL;DR

This paper introduces a new benchmark and evaluation framework to assess large language models' ability to recall facts accurately across different points in time, highlighting current limitations in temporal reasoning.

Contribution

We present a novel dataset and evaluation method for temporal fact recall, revealing challenges in LLMs' temporal consistency and performance across different model types.

Findings

01

Base models outperform instruction-tuned models on time-sensitive recall.

02

Large models show brittleness with paraphrased facts.

03

Temporal reasoning remains a significant challenge for LLMs.

Abstract

Who is the US President? The answer changes depending on when the question is asked. While large language models (LLMs) are evaluated on various reasoning tasks, they often miss a crucial dimension: time. In real-world scenarios, the correctness of answers is frequently tied to temporal context. To address this gap, we present a novel framework and dataset spanning over 8,000 events from 2018 to 2024, annotated with day-level granularity and sourced globally across domains such as politics, science, and business. Our TimeShift evaluation method systematically probes LLMs for temporal reasoning, revealing that base models often outperform instruction-tuned and synthetic-trained counterparts on time-sensitive recall. Additionally, we find that even large-scale models exhibit brittleness in handling paraphrased facts, highlighting unresolved challenges in temporal consistency. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Advanced Text Analysis Techniques

MethodsBalanced Selection · ALIGN