Scene Text Recognition with Temporal Convolutional Encoder
Xiangcheng Du, Tianlong Ma, Yingbin Zheng, Hao Ye, Xingjiao Wu, Liang, He

TL;DR
This paper introduces a Temporal Convolutional Encoder for scene text recognition that captures long-term dependencies, improving accuracy over traditional sequence-to-sequence models.
Contribution
The paper proposes a novel Temporal Convolutional Encoder that enhances scene text recognition by modeling long-term temporal dependencies within the encoder stage.
Findings
Improved recognition accuracy on seven datasets.
Effectiveness of attention modules in convolutional blocks.
Temporal Convolutional Encoder outperforms existing methods.
Abstract
Texts from scene images typically consist of several characters and exhibit a characteristic sequence structure. Existing methods capture the structure with the sequence-to-sequence models by an encoder to have the visual representations and then a decoder to translate the features into the label sequence. In this paper, we study text recognition framework by considering the long-term temporal dependencies in the encoder stage. We demonstrate that the proposed Temporal Convolutional Encoder with increased sequential extents improves the accuracy of text recognition. We also study the impact of different attention modules in convolutional blocks for learning accurate text representations. We conduct comparisons on seven datasets and the experiments demonstrate the effectiveness of our proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction
