CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning

Zhiyuan Lu; Chenliang Li; Yingcheng Shi; Weizhou Shen; Ming Yan; Fei Huang

arXiv:2601.14952·cs.CL·April 28, 2026

CorpusQA: A 10 Million Token Benchmark for Corpus-Level Analysis and Reasoning

Zhiyuan Lu, Chenliang Li, Yingcheng Shi, Weizhou Shen, Ming Yan, Fei Huang

PDF

TL;DR

CorpusQA introduces a large-scale benchmark for evaluating language models' ability to perform reasoning across extensive document collections, highlighting current limitations and proposing new directions.

Contribution

The paper presents a novel 10-million-token benchmark and a data synthesis framework for challenging corpus-level reasoning tasks, improving evaluation and training of long-context models.

Findings

01

State-of-the-art models struggle with increasing input length.

02

Standard retrieval methods fail on large, dispersed corpora.

03

Memory-augmented architectures outperform traditional models.

Abstract

While large language models now handle million-token contexts, their capacity for reasoning across entire document repositories remains largely untested. Existing benchmarks are inadequate, as they are mostly limited to single long texts or rely on a "sparse retrieval" assumption-that answers can be derived from a few relevant chunks. This assumption fails for true corpus-level analysis, where evidence is highly dispersed across hundreds of documents and answers require global integration, comparison, and statistical aggregation. To address this critical gap, we introduce CorpusQA, a new benchmark scaling up to 10 million tokens, generated via a novel data synthesis framework. By decoupling reasoning from textual representation, this framework creates complex, computation-intensive queries with programmatically guaranteed ground-truth answers, challenging systems to perform holistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.