TRAVELER: A Benchmark for Evaluating Temporal Reasoning across Vague,   Implicit and Explicit References

Svenja Kenneweg; J\"org Deigm\"oller; Philipp Cimiano; Julian Eggert

arXiv:2505.01325·cs.CL·May 5, 2025

TRAVELER: A Benchmark for Evaluating Temporal Reasoning across Vague, Implicit and Explicit References

Svenja Kenneweg, J\"org Deigm\"oller, Philipp Cimiano, Julian Eggert

PDF

Open Access

TL;DR

TRAVELER is a new synthetic benchmark dataset designed to evaluate language models' ability to understand and resolve various types of temporal references, including explicit, implicit, and vague, across different event set sizes.

Contribution

It introduces TRAVELER, a comprehensive benchmark for systematic evaluation of temporal reasoning in language models, covering diverse temporal reference types and event set complexities.

Findings

01

LLMs perform well on explicit references with small event sets

02

Performance drops as event set size increases and references become less explicit

03

Vague temporal references pose the greatest challenge for current models

Abstract

Understanding and resolving temporal references is essential in Natural Language Understanding as we often refer to the past or future in daily communication. Although existing benchmarks address a system's ability to reason about and resolve temporal references, systematic evaluation of specific temporal references remains limited. Towards closing this gap, we introduce TRAVELER, a novel synthetic benchmark dataset that follows a Question Answering paradigm and consists of questions involving temporal references with the corresponding correct answers. TRAVELER assesses models' abilities to resolve explicit, implicit relative to speech time, and vague temporal references. Beyond investigating the performance of state-of-the-art LLMs depending on the type of temporal reference, our benchmark also allows evaluation of performance in relation to the length of the set of events. For the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Intelligent Tutoring Systems and Adaptive Learning · Cognitive Science and Mapping

MethodsSparse Evolutionary Training