Context Perception Parallel Decoder for Scene Text Recognition

Yongkun Du; Zhineng Chen; Caiyan Jia; Xiaoting Yin and; Chenxia Li; Yuning Du; Yu-Gang Jiang

arXiv:2307.12270·cs.CV·October 10, 2023·1 cites

Context Perception Parallel Decoder for Scene Text Recognition

Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin and, Chenxia Li, Yuning Du, Yu-Gang Jiang

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper introduces the Context Perception Parallel Decoder (CPPD) for scene text recognition, which combines the speed of parallel decoding with improved accuracy by modeling linguistic and visual context.

Contribution

The paper proposes a novel CPPD model that enhances parallel decoding in scene text recognition by integrating context perception modules, achieving high accuracy and fast inference.

Findings

01

CPPD achieves comparable accuracy to autoregressive models.

02

CPPD runs approximately 8 times faster than AR-based models.

03

Plugging modules into existing decoders improves their accuracy.

Abstract

Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Autoregressive (AR)-based models implement the recognition in a character-by-character manner, showing superiority in accuracy but with slow inference speed. Alternatively, parallel decoding (PD)-based models infer all characters in a single decoding pass, offering faster inference speed but generally worse accuracy. We first present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception. Consequently, we propose Context Perception Parallel Decoder (CPPD) to predict the character sequence in a PD pass. CPPD devises a character counting module to infer the occurrence count of each character, and a character ordering module to deduce the content-free reading order and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
topdu/OpenOCR
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings