A Question Answering Dataset for Temporal-Sensitive Retrieval-Augmented Generation
Ziyang Chen, Erxue Min, Xiang Zhao, Yunxin Li, Xin Jia, Jinzhi Liao, Jichao Li, Shuaiqiang Wang, Baotian Hu, Dawei Yin

TL;DR
ChronoQA is a large-scale Chinese question answering dataset designed to evaluate temporal reasoning in retrieval-augmented generation systems, supporting diverse temporal types and multi-document scenarios.
Contribution
The paper introduces ChronoQA, a comprehensive benchmark dataset with high-quality annotations for temporal reasoning in RAG systems, covering various temporal expressions and validation methods.
Findings
Dataset contains 5,176 questions from 300,000 news articles.
Supports multiple temporal reasoning types and multi-document scenarios.
Validated through rule-based, LLM-based, and human evaluations.
Abstract
We introduce ChronoQA, a large-scale benchmark dataset for Chinese question answering, specifically designed to evaluate temporal reasoning in Retrieval-Augmented Generation (RAG) systems. ChronoQA is constructed from over 300,000 news articles published between 2019 and 2024, and contains 5,176 high-quality questions covering absolute, aggregate, and relative temporal types with both explicit and implicit time expressions. The dataset supports both single- and multi-document scenarios, reflecting the real-world requirements for temporal alignment and logical consistency. ChronoQA features comprehensive structural annotations and has undergone multi-stage validation, including rule-based, LLM-based, and human evaluation, to ensure data quality. By providing a dynamic, reliable, and scalable resource, ChronoQA enables structured evaluation across a wide range of temporal tasks, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications
