UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization

Md Nayem Uddin; Amir Saeidi; Divij Handa; Agastya Seth; Tran Cao Son; Eduardo Blanco; Steven R. Corman; Chitta Baral

arXiv:2407.03525·cs.CL·June 4, 2025

UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization

Md Nayem Uddin, Amir Saeidi, Divij Handa, Agastya Seth, Tran Cao Son, Eduardo Blanco, Steven R. Corman, Chitta Baral

PDF

Open Access 1 Datasets 1 Video

TL;DR

UnSeenTimeQA is a new benchmark for testing large language models' ability to perform time-sensitive reasoning without relying on pre-existing knowledge, using synthetic, contamination-free data.

Contribution

The paper introduces UnSeenTimeQA, a novel, contamination-free TSQA benchmark with synthetic data and a flexible generation framework for evaluating LLMs' temporal reasoning.

Findings

01

LLMs perform well on simple questions but struggle with complex event dependencies.

02

Performance is lower compared to real-world fact-based TSQA.

03

Error analysis highlights difficulties in reasoning over long-range and parallel events.

Abstract

This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real world. We present a series of time-sensitive event scenarios based on synthetically generated facts. It requires large language models (LLMs) to engage in genuine temporal reasoning without depending on the factual knowledge acquired during the pre-training phase. Our data generation framework enables on-demand generation of new samples, mitigating the risk of data leakage. We designed three types of time-sensitive questions to test LLMs' temporal reasoning abilities over sequential and parallel event occurrences. Our evaluation of five LLMs on synthetic fact-based TSQA reveals mixed results: while they perform well on simpler subsets, their overall performance remains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nurakib/UnSeenTimeQA
dataset· 54 dl
54 dl

Videos

UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies