Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, Errui Ding

TL;DR
This paper introduces a semantic reasoning network (SRN) with a global semantic reasoning module (GSRM) that captures semantic context in scene text recognition, outperforming RNN-based methods in accuracy and efficiency.
Contribution
The paper presents a novel end-to-end framework with a global semantic reasoning module that overcomes RNN limitations, improving scene text recognition accuracy and speed.
Findings
Achieved state-of-the-art results on 7 benchmarks.
Demonstrated robustness across various text types.
Significantly faster than RNN-based methods.
Abstract
Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Towards Accurate Scene Text Recognition With Semantic Reasoning Networks· youtube
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Semantic Reasoning Network
