Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition
Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Yongdong Zhang

TL;DR
This paper introduces the LPV model that enhances scene text recognition by integrating linguistic perception into vision models, addressing attention drift and visual feature limitations for improved accuracy and efficiency.
Contribution
The paper proposes a novel LPV model with Cascade Position Attention and Global Linguistic Reconstruction Module to incorporate linguistic knowledge into vision-based scene text recognition.
Findings
Achieves state-of-the-art accuracy of 92.4% on scene text recognition.
Maintains low model complexity with only 8.11 million parameters.
Outperforms previous methods in both accuracy and efficiency.
Abstract
Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge and information, recent vision models suffer from two problems: (1) the pure vision-based query results in attention drift, which usually causes poor recognition and is summarized as linguistic insensitive drift (LID) problem in this paper. (2) the visual feature is suboptimal for the recognition in some vision-missing cases (e.g. occlusion, etc.). To address these issues, we propose a inguistic erception ision model (LPV), which explores the linguistic capability of vision model for accurate text recognition. To alleviate the LID problem, we introduce a Cascade Position Attention (CPA) mechanism that obtains high-quality and accurate attention maps through step-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Text and Document Classification Technologies
