Scene Text Recognition with Temporal Convolutional Encoder

Xiangcheng Du; Tianlong Ma; Yingbin Zheng; Hao Ye; Xingjiao Wu; Liang; He

arXiv:1911.01051·cs.CV·February 18, 2020

Scene Text Recognition with Temporal Convolutional Encoder

Xiangcheng Du, Tianlong Ma, Yingbin Zheng, Hao Ye, Xingjiao Wu, Liang, He

PDF

Open Access

TL;DR

This paper introduces a Temporal Convolutional Encoder for scene text recognition that captures long-term dependencies, improving accuracy over traditional sequence-to-sequence models.

Contribution

The paper proposes a novel Temporal Convolutional Encoder that enhances scene text recognition by modeling long-term temporal dependencies within the encoder stage.

Findings

01

Improved recognition accuracy on seven datasets.

02

Effectiveness of attention modules in convolutional blocks.

03

Temporal Convolutional Encoder outperforms existing methods.

Abstract

Texts from scene images typically consist of several characters and exhibit a characteristic sequence structure. Existing methods capture the structure with the sequence-to-sequence models by an encoder to have the visual representations and then a decoder to translate the features into the label sequence. In this paper, we study text recognition framework by considering the long-term temporal dependencies in the encoder stage. We demonstrate that the proposed Temporal Convolutional Encoder with increased sequential extents improves the accuracy of text recognition. We also study the impact of different attention modules in convolutional blocks for learning accurate text representations. We conduct comparisons on seven datasets and the experiments demonstrate the effectiveness of our proposed approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction