MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Zhang Li; Zhibo Lin; Qiang Liu; Ziyang Zhang; Shuo Zhang; Zidun Guo; Jiajun Song; Jiarui Zhang; Xiang Bai; Yuliang Liu

arXiv:2603.28130·cs.CV·March 31, 2026

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu

PDF

1 Repo 1 Datasets

TL;DR

MDPBench is a comprehensive benchmark evaluating multilingual document parsing performance across diverse scripts, languages, and real-world conditions, highlighting current model limitations and guiding future improvements.

Contribution

It introduces the first multilingual document parsing benchmark with extensive annotations and evaluation splits, revealing performance gaps especially in open-source models on non-Latin scripts and photographed documents.

Findings

01

Closed-source models like Gemini3-Pro are more robust across conditions.

02

Open-source models show a 17.8% performance drop on photographed documents.

03

Significant performance disparities exist across languages and document types.

Abstract

We introduce Multilingual Document Parsing Benchmark, the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluate how models perform on digital and photographed documents across diverse scripts and low-resource languages. MDPBench comprises 3,400 document images spanning 17 languages, diverse scripts, and varied photographic conditions, with high-quality annotations produced through a rigorous pipeline of expert model labeling, manual correction, and human verification. To ensure fair comparison and prevent data leakage, we maintain separate public and private evaluation splits. Our comprehensive evaluation of both open-source and closed-source models uncovers a striking finding:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yuliang-Liu/MultimodalOCR
github

Datasets

Delores-Lin/MDPBench
dataset· 7.6k dl
7.6k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.