PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text   Recognition

Zhi Qiao; Yu Zhou; Jin Wei; Wei Wang; Yuan Zhang; Ning Jiang; Hongbin; Wang; Weiping Wang

arXiv:2109.04145·cs.CV·September 10, 2021·1 cites

PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition

Zhi Qiao, Yu Zhou, Jin Wei, Wei Wang, Yuan Zhang, Ning Jiang, Hongbin, Wang, Weiping Wang

PDF

Open Access 1 Repo

TL;DR

PIMNet is a novel scene text recognition model that combines parallel prediction with iterative refinement and mimicking learning to achieve a balance of high accuracy and fast inference speed.

Contribution

It introduces a parallel, iterative, and mimicking network architecture that trains end-to-end without pre-training, improving both accuracy and efficiency in scene text recognition.

Findings

01

Outperforms existing methods on public benchmarks.

02

Achieves faster inference with comparable accuracy to autoregressive models.

03

Effectively balances speed and accuracy through iterative and mimicking mechanisms.

Abstract

Nowadays, scene text recognition has attracted more and more attention due to its various applications. Most state-of-the-art methods adopt an encoder-decoder framework with attention mechanism, which generates text autoregressively from left to right. Despite the convincing performance, the speed is limited because of the one-by-one decoding strategy. As opposed to autoregressive models, non-autoregressive models predict the results in parallel with a much shorter inference time, but the accuracy falls behind the autoregressive counterpart considerably. In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency. Specifically, PIMNet adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate. In each iteration, the context information is fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pay20y/pimnet
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques