Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

Zhanli Li; Yixuan Cao; Lvzhou Luo; Ping Luo

arXiv:2604.22239·cs.CL·April 27, 2026

Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

Zhanli Li, Yixuan Cao, Lvzhou Luo, Ping Luo

PDF

1 Repo 1 Datasets

TL;DR

MuDABench is a new benchmark for multi-document analytical question answering over large, semi-structured collections, emphasizing extensive cross-document reasoning and aggregation, with a multi-agent system improving performance but still lagging behind humans.

Contribution

The paper introduces MuDABench, a large-scale benchmark for multi-document analytical QA, and proposes a multi-agent workflow to enhance reasoning over extensive document collections.

Findings

01

Standard RAG systems perform poorly on MuDABench.

02

Multi-agent workflow improves reasoning and answer accuracy.

03

Significant gap remains between system performance and human experts.

Abstract

This paper introduces the task of analytical question answering over large, semi-structured document collections. We present MuDABench, a benchmark for multi-document analytical QA, where questions require extracting and synthesizing information across numerous documents to perform quantitative analysis. Unlike existing multi-document QA benchmarks that typically require information from only a few documents with limited cross-document reasoning, MuDABench demands extensive inter-document analysis and aggregation. Constructed via distant supervision by leveraging document-level metadata and annotated financial databases, MuDABench comprises over 80,000 pages and 332 analytical QA instances. We also propose an evaluation protocol that measures final answer accuracy and uses intermediate-fact coverage as an auxiliary diagnostic signal for the reasoning process. Experiments reveal that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zhanli-Li/MuDABench
github

Datasets

Zhanli-Li/MuDABench
dataset· 844 dl
844 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.