NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text   Recognition

Fenfen Sheng; Zhineng Chen; Bo Xu

arXiv:1806.00926·cs.CV·October 11, 2019·19 cites

NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition

Fenfen Sheng, Zhineng Chen, Bo Xu

PDF

Open Access 4 Repos 1 Models 1 Datasets

TL;DR

NRTR introduces a novel no-recurrence, self-attention based sequence-to-sequence model for scene text recognition, achieving state-of-the-art results with significantly faster training times by eliminating recurrence and convolution.

Contribution

The paper presents the first no-recurrence, self-attention based scene text recognizer, reducing complexity and training time while maintaining high accuracy.

Findings

01

Achieves state-of-the-art or competitive performance on benchmarks.

02

Requires at least 8 times less training time than previous models.

03

Effectively handles regular and irregular scene text.

Abstract

Scene text recognition has attracted a great many researches due to its importance to various applications. Existing methods mainly adopt recurrence or convolution based networks. Though have obtained good performance, these methods still suffer from two limitations: slow training speed due to the internal recurrence of RNNs, and high complexity due to stacked convolutional layers for long-term feature extraction. This paper, for the first time, proposes a no-recurrence sequence-to-sequence text recognizer, named NRTR, that dispenses with recurrences and convolutions entirely. NRTR follows the encoder-decoder paradigm, where the encoder uses stacked self-attention to extract image features, and the decoder applies stacked self-attention to recognize texts based on encoder output. NRTR relies solely on self-attention mechanism thus could be trained with more parallelization and less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
topdu/OpenOCR
model· ♡ 5
♡ 5

Datasets

dlxjj/OpenOCR
dataset· 1.3k dl
1.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution