STRIDE : Scene Text Recognition In-Device
Rachit S Munjal, Arun D Prabhu, Nikhil Arora, Sukumar Moharana, Gopi, Ramena

TL;DR
This paper presents a lightweight, real-time scene text recognition system optimized for on-device deployment, combining convolution attention modules and a novel orientation classifier to achieve high accuracy with minimal computational resources.
Contribution
The authors introduce a compact, efficient scene text recognition model with convolution attention modules and an orientation classifier, enabling real-time on-device OCR with high accuracy.
Findings
Model has only 0.88M parameters and runs in 2.44 ms per word.
Achieves 88.4% accuracy on ICDAR-13 dataset.
Surpasses on-device inference time and memory footprint of existing OCR systems.
Abstract
Optical Character Recognition (OCR) systems have been widely used in various applications for extracting semantic information from images. To give the user more control over their privacy, an on-device solution is needed. The current state-of-the-art models are too heavy and complex to be deployed on-device. We develop an efficient lightweight scene text recognition (STR) system, which has only 0.88M parameters and performs real-time text recognition. Attention modules tend to boost the accuracy of STR networks but are generally slow and not optimized for device inference. So, we propose the use of convolution attention modules to the text recognition networks, which aims to provide channel and spatial attention information to the LSTM module by adding very minimal computational cost. It boosts our word accuracy on ICDAR 13 dataset by almost 2\%. We also introduce a novel orientation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Convolution
