DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Qintong Zhang; Junyuan Zhang; Zhifei Ren; Linke Ouyang; Zichen Wen; Junbo Niu; Yuan Qu; Bin Wang; Ka-Ho Chow; Conghui He; Wentao Zhang

arXiv:2512.10619·cs.CV·December 12, 2025

DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM

Qintong Zhang, Junyuan Zhang, Zhifei Ren, Linke Ouyang, Zichen Wen, Junbo Niu, Yuan Qu, Bin Wang, Ka-Ho Chow, Conghui He, Wentao Zhang

PDF

Open Access 1 Models 1 Datasets

TL;DR

DOCR-Inspector introduces a fine-grained, automated evaluation framework for document parsing that detects specific errors and assesses quality, surpassing existing models and aiding system improvement.

Contribution

This work presents a novel hierarchical error detection approach and a large annotated benchmark for comprehensive document parsing evaluation using vision language models.

Findings

01

DOCR-Inspector-7B outperforms commercial and open-source models on real-world cases.

02

The hierarchical Chain-of-Checklist reasoning improves error detection accuracy.

03

Quality assessments from DOCR-Inspector guide parsing refinement effectively.

Abstract

Document parsing aims to transform unstructured PDF images into semi-structured data, facilitating the digitization and utilization of information in diverse domains. While vision language models (VLMs) have significantly advanced this task, achieving reliable, high-quality parsing in real-world scenarios remains challenging. Common practice often selects the top-performing model on standard benchmarks. However, these benchmarks may carry dataset-specific biases, leading to inconsistent model rankings and limited correlation with real-world performance. Moreover, benchmark metrics typically provide only overall scores, which can obscure distinct error patterns in output. This raises a key challenge: how can we reliably and comprehensively assess document parsing quality in the wild? We address this problem with DOCR-Inspector, which formalizes document parsing assessment as fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ZQTTTT/DOCR-Inspector-7B
model· 30 dl· ♡ 1
30 dl♡ 1

Datasets

ZQTTTT/DOCRcase-Datasets
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques