General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming, Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu, Zhang

TL;DR
The paper introduces GOT, a unified end-to-end model with 580M parameters, capable of handling diverse OCR tasks including various character types and formats, advancing towards a comprehensive OCR-2.0 system.
Contribution
Proposes the General OCR Theory and the GOT model, a versatile, end-to-end solution for multiple OCR tasks with interactive and multi-page capabilities.
Findings
GOT outperforms existing OCR models in diverse tasks.
Supports multiple output formats and interactive features.
Effective handling of various character types and document styles.
Abstract
Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT, with 580M parameters, is a unified, elegant, and end-to-end model, consisting of a high-compression encoder and a long-contexts decoder. As an OCR-2.0 model, GOT can handle all the above "characters" under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗abhinand/GOT-OCR-2.0-unofficialmodel· 6 dl· ♡ 176 dl♡ 17
- 🤗stepfun-ai/GOT-OCR2_0model· 56k dl· ♡ 153156k dl♡ 1531
- 🤗mallapraveen/GOT-OCR2_0model· 4 dl4 dl
- 🤗srimanth-d/GOT_CPUmodel· 42 dl· ♡ 1142 dl♡ 11
- 🤗RufusRubin777/GOT-OCR2_0_CPUmodel· 6 dl6 dl
- 🤗Maltokar/GOT_OCR_MPmodel· 2 dl2 dl
- 🤗aarishshahmohsin/got_ocr_2model· 5 dl5 dl
- 🤗tdnathmlenthusiast/testermodel· 2 dl2 dl
- 🤗uzumaki06/OCR2.0model· 1 dl1 dl
- 🤗philipp-zettl/GOT-OCR2_0model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Mathematics, Computing, and Information Processing
