PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
Zhi Qiao, Yu Zhou, Jin Wei, Wei Wang, Yuan Zhang, Ning Jiang, Hongbin, Wang, Weiping Wang

TL;DR
PIMNet is a novel scene text recognition model that combines parallel prediction with iterative refinement and mimicking learning to achieve a balance of high accuracy and fast inference speed.
Contribution
It introduces a parallel, iterative, and mimicking network architecture that trains end-to-end without pre-training, improving both accuracy and efficiency in scene text recognition.
Findings
Outperforms existing methods on public benchmarks.
Achieves faster inference with comparable accuracy to autoregressive models.
Effectively balances speed and accuracy through iterative and mimicking mechanisms.
Abstract
Nowadays, scene text recognition has attracted more and more attention due to its various applications. Most state-of-the-art methods adopt an encoder-decoder framework with attention mechanism, which generates text autoregressively from left to right. Despite the convincing performance, the speed is limited because of the one-by-one decoding strategy. As opposed to autoregressive models, non-autoregressive models predict the results in parallel with a much shorter inference time, but the accuracy falls behind the autoregressive counterpart considerably. In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency. Specifically, PIMNet adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate. In each iteration, the context information is fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques
