Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations

Hichem Ammar Khodja; Fr\'ed\'eric B\'echet; Quentin Brabant; Alexis Nasr; Gw\'enol\'e Lecorv\'e

arXiv:2502.01220·cs.CL·June 24, 2025

Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations

Hichem Ammar Khodja, Fr\'ed\'eric B\'echet, Quentin Brabant, Alexis Nasr, Gw\'enol\'e Lecorv\'e

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper investigates how well language models understand and differentiate factual information across different temporal contexts, revealing significant limitations in their ability to accurately associate facts with correct time periods.

Contribution

It introduces the TimeStress dataset for evaluating temporal robustness in LMs and provides a comprehensive analysis of their limitations in temporal factual reasoning.

Findings

01

Best LM achieves perfect distinction for only 11% of facts

02

Current LMs struggle with temporal context differentiation

03

Errors made by LMs are rare but critical

Abstract

This paper explores the robustness of language models (LMs) to variations in the temporal context within factual knowledge. It examines whether LMs can correctly associate a temporal context with a past fact valid over a defined period, by asking them to differentiate correct from incorrect contexts. The LMs' ability to distinguish is analyzed along two dimensions: the distance of the incorrect context from the validity period and the granularity of the context. To this end, a dataset called TimeStress is introduced, enabling the evaluation of 18 diverse LMs. Results reveal that the best LM achieves a perfect distinction for only 11% of the studied facts, with errors, certainly rare, but critical that humans would not make. This work highlights the limitations of current LMs in temporal representation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Orange/TimeStress
dataset· 55 dl
55 dl

Videos

Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods