On-Device Spatial Attention based Sequence Learning Approach for Scene Text Script Identification
Rutika Moharir, Arun D Prabhu, Sukumar Moharana, Gopi Ramena, and, Rachit S Munjal

TL;DR
This paper introduces a lightweight, real-time on-device CNN-LSTM model with spatial attention for scene text script identification, suitable for resource-constrained mobile devices, achieving competitive accuracy and efficiency.
Contribution
The paper presents a novel, efficient CNN-LSTM architecture with spatial attention and residual blocks for script identification, optimized for on-device deployment.
Findings
Achieves competitive accuracy with state-of-the-art methods.
Has a small model size of 1.1 million parameters.
Inference time is only 2.7 milliseconds.
Abstract
Automatic identification of script is an essential component of a multilingual OCR engine. In this paper, we present an efficient, lightweight, real-time and on-device spatial attention based CNN-LSTM network for scene text script identification, feasible for deployment on resource constrained mobile devices. Our network consists of a CNN, equipped with a spatial attention module which helps reduce the spatial distortions present in natural images. This allows the feature extractor to generate rich image representations while ignoring the deformities and thereby, enhancing the performance of this fine grained classification task. The network also employs residue convolutional blocks to build a deep network to focus on the discriminative features of a script. The CNN learns the text feature representation by identifying each character as belonging to a particular script and the long term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Speech Recognition and Synthesis · Image Retrieval and Classification Techniques
MethodsSigmoid Activation · Average Pooling · Convolution · Tanh Activation · Max Pooling · Long Short-Term Memory
