UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization
Md Nayem Uddin, Amir Saeidi, Divij Handa, Agastya Seth, Tran Cao Son, Eduardo Blanco, Steven R. Corman, Chitta Baral

TL;DR
UnSeenTimeQA is a new benchmark for testing large language models' ability to perform time-sensitive reasoning without relying on pre-existing knowledge, using synthetic, contamination-free data.
Contribution
The paper introduces UnSeenTimeQA, a novel, contamination-free TSQA benchmark with synthetic data and a flexible generation framework for evaluating LLMs' temporal reasoning.
Findings
LLMs perform well on simple questions but struggle with complex event dependencies.
Performance is lower compared to real-world fact-based TSQA.
Error analysis highlights difficulties in reasoning over long-range and parallel events.
Abstract
This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real world. We present a series of time-sensitive event scenarios based on synthetically generated facts. It requires large language models (LLMs) to engage in genuine temporal reasoning without depending on the factual knowledge acquired during the pre-training phase. Our data generation framework enables on-demand generation of new samples, mitigating the risk of data leakage. We designed three types of time-sensitive questions to test LLMs' temporal reasoning abilities over sequential and parallel event occurrences. Our evaluation of five LLMs on synthetic fact-based TSQA reveals mixed results: while they perform well on simpler subsets, their overall performance remains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
