General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Haoran Wei; Chenglong Liu; Jinyue Chen; Jia Wang; Lingyu Kong; Yanming; Xu; Zheng Ge; Liang Zhao; Jianjian Sun; Yuang Peng; Chunrui Han; Xiangyu; Zhang

arXiv:2409.01704·cs.CV·September 4, 2024·5 cites

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming, Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu, Zhang

PDF

Open Access 1 Repo 10 Models

TL;DR

The paper introduces GOT, a unified end-to-end model with 580M parameters, capable of handling diverse OCR tasks including various character types and formats, advancing towards a comprehensive OCR-2.0 system.

Contribution

Proposes the General OCR Theory and the GOT model, a versatile, end-to-end solution for multiple OCR tasks with interactive and multi-page capabilities.

Findings

01

GOT outperforms existing OCR models in diverse tasks.

02

Supports multiple output formats and interactive features.

03

Effective handling of various character types and document styles.

Abstract

Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT, with 580M parameters, is a unified, elegant, and end-to-end model, consisting of a high-compression encoder and a long-contexts decoder. As an OCR-2.0 model, GOT can handle all the above "characters" under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucas-haoranwei/got-ocr2.0
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Mathematics, Computing, and Information Processing