CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing

Zhipeng Xu; Junhao Ji; Zulong Chen; Zhenghao Liu; Qing Liu; Chunyi Peng; Zubao Qin; Ze Xu; Jianqiang Wan; Jun Tang; Zhibo Yang; Shuai Bai; Dayiheng Liu

arXiv:2605.03903·cs.CL·May 6, 2026

CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing

Zhipeng Xu, Junhao Ji, Zulong Chen, Zhenghao Liu, Qing Liu, Chunyi Peng, Zubao Qin, Ze Xu, Jianqiang Wan, Jun Tang, Zhibo Yang, Shuai Bai, Dayiheng Liu

PDF

1 Repo 1 Datasets

TL;DR

CC-OCR V2 introduces a comprehensive benchmark for evaluating large multimodal models on real-world OCR tasks, highlighting their current limitations in practical document processing scenarios.

Contribution

The paper presents CC-OCR V2, a new challenging benchmark tailored to real-world OCR applications, with extensive experiments showing current models' performance gaps.

Findings

01

Current LMMs perform poorly on real-world OCR tasks.

02

State-of-the-art models show significant performance degradation.

03

Benchmark includes 7,093 high-difficulty samples across 5 OCR-centric tracks.

Abstract

Large Multimodal Models (LMMs) have recently shown strong performance on Optical Character Recognition (OCR) tasks, demonstrating their promising capability in document literacy. However, their effectiveness in real-world applications remains underexplored, as existing benchmarks adopt task scopes misaligned with practical applications and assume homogeneous acquisition conditions. To address this gap, we introduce CC-OCR V2, a comprehensive and challenging OCR benchmark tailored to real-world document processing. CC-OCR V2 focuses on practical enterprise document processing tasks and incorporates hard and corner cases that are critical yet underrepresented in prior benchmarks, covering 5 major OCR-centric tracks: text recognition, document parsing, document grounding, key information extraction, and document question answering, comprising 7,093 high-difficulty samples. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eioss/CC-OCR-V2
github

Datasets

Eioss/CC-OCR-V2
dataset· 9.9k dl
9.9k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.