Ocean-OCR: Towards General OCR Application via a Vision-Language Model
Song Chen, Xinyu Guo, Yadong Li, Tao Zhang, Mingan Lin, Dongdong, Kuang, Youwei Zhang, Lingfeng Ming, Fengyu Zhang, Yuran Wang, Jianhua Xu,, Zenan Zhou, Weipeng Chen

TL;DR
Ocean-OCR is a large multimodal language model specifically designed to excel in various optical character recognition tasks, outperforming existing OCR models and demonstrating versatile understanding across multiple text-related scenarios.
Contribution
This paper introduces Ocean-OCR, a 3-billion parameter vision-language model with state-of-the-art OCR performance and broad applicability to general understanding tasks.
Findings
Outperforms professional OCR models like TextIn and PaddleOCR
Excels in document understanding, scene text, and handwritten recognition
Demonstrates robust OCR capabilities across diverse scenarios
Abstract
Multimodal large language models (MLLMs) have shown impressive capabilities across various domains, excelling in processing and understanding information from multiple modalities. Despite the rapid progress made previously, insufficient OCR ability hinders MLLMs from excelling in text-related tasks. In this paper, we present \textbf{Ocean-OCR}, a 3B MLLM with state-of-the-art performance on various OCR scenarios and comparable understanding ability on general tasks. We employ Native Resolution ViT to enable variable resolution input and utilize a substantial collection of high-quality OCR datasets to enhance the model performance. We demonstrate the superiority of Ocean-OCR through comprehensive experiments on open-source OCR benchmarks and across various OCR scenarios. These scenarios encompass document understanding, scene text recognition, and handwritten recognition, highlighting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications · Image Retrieval and Classification Techniques
