PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Yuning Du; Chenxia Li; Ruoyu Guo; Cheng Cui; Weiwei Liu; Jun Zhou; Bin; Lu; Yehua Yang; Qiwen Liu; Xiaoguang Hu; Dianhai Yu; Yanjun Ma

arXiv:2109.03144·cs.CV·October 13, 2021·26 cites

PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Yuning Du, Chenxia Li, Ruoyu Guo, Cheng Cui, Weiwei Liu, Jun Zhou, Bin, Lu, Yehua Yang, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma

PDF

Open Access 3 Repos

TL;DR

PP-OCRv2 is an improved ultra lightweight OCR system that employs various training tricks to enhance accuracy while maintaining high efficiency, making it suitable for real-world applications.

Contribution

The paper introduces PP-OCRv2, incorporating multiple novel training techniques to significantly boost OCR accuracy without increasing inference costs.

Findings

01

PP-OCRv2 achieves 7% higher precision than PP-OCR at the same inference cost.

02

The system's accuracy is comparable to server-based models using ResNet backbones.

03

All models and code are open-sourced in PaddleOCR GitHub repository.

Abstract

Optical Character Recognition (OCR) systems have been widely used in various of application scenarios. Designing an OCR system is still a challenging task. In previous work, we proposed a practical ultra lightweight OCR system (PP-OCR) to balance the accuracy against the efficiency. In order to improve the accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more robust OCR system, i.e. PP-OCRv2. We introduce bag of tricks to train a better text detector and a better text recognizer, which include Collaborative Mutual Learning (CML), CopyPaste, Lightweight CPUNetwork (LCNet), Unified-Deep Mutual Learning (U-DML) and Enhanced CTCLoss. Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost. It is also comparable to the server models of the PP-OCR which uses ResNet series as backbones. All of the above…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · Global Average Pooling · Max Pooling · 1x1 Convolution · Kaiming Initialization · PP-OCR · Residual Block · Convolution · Residual Connection