Reading Scene Text with Attention Convolutional Sequence Modeling

Yunze Gao (1; 2); Yingying Chen (1; 2); Jinqiao Wang (1; 2),; Hanqing Lu (1; 2) ((1) National Lab of Pattern Recognition; Institute of; Automation; Chinese Academy of Sciences; (2) University of Chinese Academy of; Sciences)

arXiv:1709.04303·cs.CV·September 14, 2017·55 cites

Reading Scene Text with Attention Convolutional Sequence Modeling

Yunze Gao (1, 2), Yingying Chen (1, 2), Jinqiao Wang (1, 2),, Hanqing Lu (1, 2) ((1) National Lab of Pattern Recognition, Institute of, Automation, Chinese Academy of Sciences, (2) University of Chinese Academy of, Sciences)

PDF

Open Access

TL;DR

This paper introduces an end-to-end convolutional attention network for scene text recognition that is faster and more efficient than RNN-based methods, achieving state-of-the-art results on standard benchmarks.

Contribution

The paper proposes replacing RNNs with convolutional layers and residual attention modules for scene text recognition, improving speed and discriminability.

Findings

01

9 times faster than Bidirectional LSTM

02

Achieves state-of-the-art performance on benchmarks

03

Effective in suppressing background noise

Abstract

Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN, we adopt the stacked convolutional layers to effectively capture the contextual dependencies of the input sequence, which is characterized by lower computational complexity and easier parallel computation. Compared to the chain structure of recurrent networks, the Convolutional Neural Network (CNN) provides a natural way to capture long-term dependencies between elements, which is 9 times faster than Bidirectional Long Short-Term Memory (BLSTM). Furthermore, in order to enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques