OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities
Peirong Zhang, Haowei Xu, Jiaxin Zhang, Xuhan Zheng, Guitao Xu, Yuyi Zhang, Junle Liu, Zhenhua Yang, Wei Zhou, Lianwen Jin

TL;DR
OCRGenBench is a comprehensive benchmark designed to evaluate the full spectrum of OCR generative capabilities across diverse tasks, challenging scenarios, and complex content to advance visual text synthesis models.
Contribution
It unifies multiple OCR-related tasks into a single benchmark, introduces a new evaluation metric, and provides extensive analysis of current generative models' limitations.
Findings
Most models score below 60/100 on OCRGenScore.
Current models struggle with dense, small, or complex text.
Significant gaps remain in text localization and content preservation.
Abstract
Improving visual text synthesis has long been a challenging and evolving frontier for image generation models. While recent state-of-the-art (SOTA) models have made remarkable strides in text generation capabilities, existing benchmarks inadequately assess their true performance due to narrow scope (scene text and posters only), isolated evaluation (T2I generation or editing separately), and insufficient difficulty (lacking challenging scenarios). To bridge this gap, we pioneer the unification of text-centric T2I generation, text editing, and OCR-related image-to-image translation to evaluate a model's holistic visual text synthesis abilities, i.e., OCR generative capabilities. Accordingly, we propose OCRGenBench, the most comprehensive benchmark to date for evaluating these abilities. OCRGenBench covers five common text categories and 33 OCR generative tasks, encompassing T2I…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications
