Reading Scene Text with Attention Convolutional Sequence Modeling
Yunze Gao (1, 2), Yingying Chen (1, 2), Jinqiao Wang (1, 2),, Hanqing Lu (1, 2) ((1) National Lab of Pattern Recognition, Institute of, Automation, Chinese Academy of Sciences, (2) University of Chinese Academy of, Sciences)

TL;DR
This paper introduces an end-to-end convolutional attention network for scene text recognition that is faster and more efficient than RNN-based methods, achieving state-of-the-art results on standard benchmarks.
Contribution
The paper proposes replacing RNNs with convolutional layers and residual attention modules for scene text recognition, improving speed and discriminability.
Findings
9 times faster than Bidirectional LSTM
Achieves state-of-the-art performance on benchmarks
Effective in suppressing background noise
Abstract
Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN, we adopt the stacked convolutional layers to effectively capture the contextual dependencies of the input sequence, which is characterized by lower computational complexity and easier parallel computation. Compared to the chain structure of recurrent networks, the Convolutional Neural Network (CNN) provides a natural way to capture long-term dependencies between elements, which is 9 times faster than Bidirectional Long Short-Term Memory (BLSTM). Furthermore, in order to enhance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
