A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation

Ziyang Chen; Erxue Min; Xiang Zhao; Yunxin Li; Xin Jia; Jinzhi Liao; Jichao Li; Shuaiqiang Wang; Baotian Hu; Dawei Yin

arXiv:2508.12282·cs.CL·August 19, 2025

A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation

Ziyang Chen, Erxue Min, Xiang Zhao, Yunxin Li, Xin Jia, Jinzhi Liao, Jichao Li, Shuaiqiang Wang, Baotian Hu, Dawei Yin

PDF

Open Access

TL;DR

ChronoQA is a large-scale Chinese question answering dataset designed to evaluate temporal reasoning in retrieval-augmented generation systems, supporting diverse temporal types and multi-document scenarios.

Contribution

The paper introduces ChronoQA, a comprehensive benchmark dataset with high-quality annotations for temporal reasoning in RAG systems, covering various temporal expressions and validation methods.

Findings

01

Dataset contains 5,176 questions from 300,000 news articles.

02

Supports multiple temporal reasoning types and multi-document scenarios.

03

Validated through rule-based, LLM-based, and human evaluations.

Abstract

We introduce ChronoQA, a large-scale benchmark dataset for Chinese question answering, specifically designed to evaluate temporal reasoning in Retrieval-Augmented Generation (RAG) systems. ChronoQA is constructed from over 300,000 news articles published between 2019 and 2024, and contains 5,176 high-quality questions covering absolute, aggregate, and relative temporal types with both explicit and implicit time expressions. The dataset supports both single- and multi-document scenarios, reflecting the real-world requirements for temporal alignment and logical consistency. ChronoQA features comprehensive structural annotations and has undergone multi-stage validation, including rule-based, LLM-based, and human evaluation, to ensure data quality. By providing a dynamic, reliable, and scalable resource, ChronoQA enables structured evaluation across a wide range of temporal tasks, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications