CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing
Zhipeng Xu, Junhao Ji, Zulong Chen, Zhenghao Liu, Qing Liu, Chunyi Peng, Zubao Qin, Ze Xu, Jianqiang Wan, Jun Tang, Zhibo Yang, Shuai Bai, Dayiheng Liu

TL;DR
CC-OCR V2 introduces a comprehensive benchmark for evaluating large multimodal models on real-world OCR tasks, highlighting their current limitations in practical document processing scenarios.
Contribution
The paper presents CC-OCR V2, a new challenging benchmark tailored to real-world OCR applications, with extensive experiments showing current models' performance gaps.
Findings
Current LMMs perform poorly on real-world OCR tasks.
State-of-the-art models show significant performance degradation.
Benchmark includes 7,093 high-difficulty samples across 5 OCR-centric tracks.
Abstract
Large Multimodal Models (LMMs) have recently shown strong performance on Optical Character Recognition (OCR) tasks, demonstrating their promising capability in document literacy. However, their effectiveness in real-world applications remains underexplored, as existing benchmarks adopt task scopes misaligned with practical applications and assume homogeneous acquisition conditions. To address this gap, we introduce CC-OCR V2, a comprehensive and challenging OCR benchmark tailored to real-world document processing. CC-OCR V2 focuses on practical enterprise document processing tasks and incorporates hard and corner cases that are critical yet underrepresented in prior benchmarks, covering 5 major OCR-centric tracks: text recognition, document parsing, document grounding, key information extraction, and document question answering, comprising 7,093 high-difficulty samples. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
