TRAVELER: A Benchmark for Evaluating Temporal Reasoning across Vague, Implicit and Explicit References
Svenja Kenneweg, J\"org Deigm\"oller, Philipp Cimiano, Julian Eggert

TL;DR
TRAVELER is a new synthetic benchmark dataset designed to evaluate language models' ability to understand and resolve various types of temporal references, including explicit, implicit, and vague, across different event set sizes.
Contribution
It introduces TRAVELER, a comprehensive benchmark for systematic evaluation of temporal reasoning in language models, covering diverse temporal reference types and event set complexities.
Findings
LLMs perform well on explicit references with small event sets
Performance drops as event set size increases and references become less explicit
Vague temporal references pose the greatest challenge for current models
Abstract
Understanding and resolving temporal references is essential in Natural Language Understanding as we often refer to the past or future in daily communication. Although existing benchmarks address a system's ability to reason about and resolve temporal references, systematic evaluation of specific temporal references remains limited. Towards closing this gap, we introduce TRAVELER, a novel synthetic benchmark dataset that follows a Question Answering paradigm and consists of questions involving temporal references with the corresponding correct answers. TRAVELER assesses models' abilities to resolve explicit, implicit relative to speech time, and vague temporal references. Beyond investigating the performance of state-of-the-art LLMs depending on the type of temporal reference, our benchmark also allows evaluation of performance in relation to the length of the set of events. For the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Intelligent Tutoring Systems and Adaptive Learning · Cognitive Science and Mapping
MethodsSparse Evolutionary Training
