Ocean-OCR: Towards General OCR Application via a Vision-Language Model

Song Chen; Xinyu Guo; Yadong Li; Tao Zhang; Mingan Lin; Dongdong; Kuang; Youwei Zhang; Lingfeng Ming; Fengyu Zhang; Yuran Wang; Jianhua Xu,; Zenan Zhou; Weipeng Chen

arXiv:2501.15558·cs.CV·January 28, 2025

Ocean-OCR: Towards General OCR Application via a Vision-Language Model

Song Chen, Xinyu Guo, Yadong Li, Tao Zhang, Mingan Lin, Dongdong, Kuang, Youwei Zhang, Lingfeng Ming, Fengyu Zhang, Yuran Wang, Jianhua Xu,, Zenan Zhou, Weipeng Chen

PDF

Open Access 1 Repo

TL;DR

Ocean-OCR is a large multimodal language model specifically designed to excel in various optical character recognition tasks, outperforming existing OCR models and demonstrating versatile understanding across multiple text-related scenarios.

Contribution

This paper introduces Ocean-OCR, a 3-billion parameter vision-language model with state-of-the-art OCR performance and broad applicability to general understanding tasks.

Findings

01

Outperforms professional OCR models like TextIn and PaddleOCR

02

Excels in document understanding, scene text, and handwritten recognition

03

Demonstrates robust OCR capabilities across diverse scenarios

Abstract

Multimodal large language models (MLLMs) have shown impressive capabilities across various domains, excelling in processing and understanding information from multiple modalities. Despite the rapid progress made previously, insufficient OCR ability hinders MLLMs from excelling in text-related tasks. In this paper, we present \textbf{Ocean-OCR}, a 3B MLLM with state-of-the-art performance on various OCR scenarios and comparable understanding ability on general tasks. We employ Native Resolution ViT to enable variable resolution input and utilize a substantial collection of high-quality OCR datasets to enhance the model performance. We demonstrate the superiority of Ocean-OCR through comprehensive experiments on open-source OCR benchmarks and across various OCR scenarios. These scenarios encompass document understanding, scene text recognition, and handwritten recognition, highlighting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guoxy25/Ocean-OCR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications · Image Retrieval and Classification Techniques