Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild

Changda Zhou; Ziyue Gao; Xueqing Wang; Tingquan Gao; Cheng Cui; Jing Tang; Yi Liu

arXiv:2603.04205·cs.CV·March 5, 2026

Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild

Changda Zhou, Ziyue Gao, Xueqing Wang, Tingquan Gao, Cheng Cui, Jing Tang, Yi Liu

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Real5-OmniDocBench, a comprehensive physical benchmark for evaluating and diagnosing the robustness of document parsing models in real-world scenarios, revealing significant gaps in current model performance.

Contribution

It presents the first full-scale physical reconstruction benchmark for OmniDocBench, enabling detailed analysis of factors affecting document parsing robustness in real-world conditions.

Findings

01

Models perform significantly worse in real-world scenarios compared to digital benchmarks.

02

The benchmark allows precise attribution of failure causes to geometric or optical distortions.

03

The reality gap in document parsing remains substantial, highlighting the need for more resilient models.

Abstract

While Vision-Language Models (VLMs) achieve near-perfect scores on digital document benchmarks like OmniDocBench, their performance in the unpredictable physical world remains largely unknown due to the lack of controlled yet realistic evaluations. We introduce Real5-OmniDocBench, the first benchmark that performs a full-scale, one-to-one physical reconstruction of the entire OmniDocBench v1.5 (1,355 images) across five critical real-world scenarios: Scanning, Warping, Screen-Photography, Illumination, and Skew. Unlike prior benchmark that either lack digital correspondence or employ partial sampling, our complete ground-truth mapping enables, for the first time, rigorous factor-wise attribution of performance degradation-allowing us to pinpoint whether failures stem from geometric distortions, optical artifacts, or model limitations. Our benchmark establishes a challenging new standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

PaddlePaddle/Real5-OmniDocBench
dataset· 9.0k dl
9.0k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications