MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

Bangbang Zhou; Hangdi Xing; Yifan Chen; Jianjun Xu; Qi Zheng; Feiyu Gao; Zhibo Yang; Shuai Bai; Ming Yan; Jieping Ye; Hongtao Xie

arXiv:2605.22100·cs.AI·May 22, 2026

MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing

Bangbang Zhou, Hangdi Xing, Yifan Chen, Jianjun Xu, Qi Zheng, Feiyu Gao, Zhibo Yang, Shuai Bai, Ming Yan, Jieping Ye, Hongtao Xie

PDF

TL;DR

MPDocBench-Parse is a comprehensive benchmark designed to evaluate multi-page document parsing in realistic scenarios, addressing limitations of existing benchmarks by including diverse document types, detailed evaluation protocols, and real-world challenges.

Contribution

It introduces a new benchmark with extensive annotations and evaluation protocols for multi-page document parsing, covering semantic, structural, and visual content aspects.

Findings

01

Existing models excel at text extraction but struggle with semantic continuity.

02

Models show limitations in visual content parsing and hierarchical structure recovery.

03

MPDocBench-Parse highlights the need for more advanced models for real-world document parsing.

Abstract

Document parsing converts visually rich documents into machine-readable structured representations, forming a crucial foundation for information systems. Although many benchmarks have been proposed for document parsing, they remain inadequate for realistic scenarios. Existing benchmarks either focus on specific tasks or assess only single-page, text-centric settings, making them insufficient for practical multi-page parsing. Moreover, they lack fine-grained evaluation of semantic continuity, hierarchical structure recovery, and visual content preservation. To address these gaps, we propose MPDocBench-Parse, a benchmark for multi-page document parsing in real-world applications. It contains 433 manually annotated documents with 3,246 pages, covering 15 document types in English and Chinese, with diverse layout styles, and supports document-level end-to-end evaluation. We further design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.