DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels
Zhe Xu, Jiasheng Ye, Xiaoran Liu, Xiangyang Liu, Tianxiang Sun,, Zhigeng Liu, Qipeng Guo, Linlin Li, Qun Liu, Xuanjing Huang, Xipeng Qiu

TL;DR
DetectiveQA is a novel dataset designed to evaluate long-context reasoning in large language models using detective novels, highlighting ongoing challenges in reasoning and evidence retrieval.
Contribution
The paper introduces DetectiveQA, a new dataset with annotated questions and reasoning steps for narrative reasoning in long texts, and proposes a step-wise reasoning metric for evaluation.
Findings
LLMs show persistent long-context reasoning challenges
Evidence retrieval remains difficult for mainstream LLMs
DetectiveQA provides a rigorous benchmark for future research
Abstract
Recently, significant efforts have been devoted to enhancing the long-context capabilities of Large Language Models (LLMs), particularly in long-context reasoning. To facilitate this research, we propose \textbf{DetectiveQA}, a dataset specifically designed for narrative reasoning within long contexts. We leverage detective novels, averaging over 100k tokens, to create a dataset containing 1200 human-annotated questions in both Chinese and English, each paired with corresponding reference reasoning steps. Furthermore, we introduce a step-wise reasoning metric, which enhances the evaluation of LLMs' reasoning processes. We validate our approach and evaluate the mainstream LLMs, including GPT-4, Claude, and LLaMA, revealing persistent long-context reasoning challenges and demonstrating their evidence-retrieval challenges. Our findings offer valuable insights into the study of long-context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
