When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

Lin Sun; Wang Dexian; Jingang Huang; Linglin Zhang; Change Jia; Zhengwei Cheng; Xiangzheng Zhang

arXiv:2605.00911·cs.CV·May 5, 2026

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

Lin Sun, Wang Dexian, Jingang Huang, Linglin Zhang, Change Jia, Zhengwei Cheng, Xiangzheng Zhang

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a new OCR benchmark for industrial RAG systems, revealing that high OCR accuracy does not always ensure effective downstream retrieval and generation in complex real-world documents.

Contribution

The authors present a comprehensive OCR benchmark for industrial RAG, highlighting the limitations of character-level metrics and analyzing factors affecting downstream performance.

Findings

01

High OCR accuracy does not guarantee strong RAG performance.

02

Structural and semantic errors impact retrieval success despite low WER/CER.

03

Performance degradation is consistent across various OCR models and pipeline configurations.

Abstract

Industrial Retrieval-Augmented Generation (RAG) systems depend on optical character recognition (OCR) to transform visual documents into text. Existing OCR benchmarks rely on character-level metrics, which inadequately measure downstream RAG effectiveness under real-world conditions. We introduce an OCR benchmark for industrial RAG systems covering 11 challenging document types, including extreme layouts, high-resolution pages, complex or watermarked backgrounds, historical documents with non-standard reading orders, visually decorated text, and documents containing tables and mathematical formulas. Evaluating recent SOTA OCR models under a controlled OCR-first RAG pipeline shows clear performance degradation on realistic industrial documents despite strong conventional benchmark scores. We find that high OCR accuracy does not necessarily translate into strong downstream RAG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Qihoo360/InduOCRBench
github

Datasets

qihoo360/InduOCRBench
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.