Reading Scene Text in Deep Convolutional Sequences

Pan He; Weilin Huang; Yu Qiao; Chen Change Loy; Xiaoou Tang

arXiv:1506.04395·cs.CV·December 22, 2015·38 cites

Reading Scene Text in Deep Convolutional Sequences

Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, Xiaoou Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Deep-Text Recurrent Network that treats scene text recognition as sequence labeling, effectively handling ambiguous words, distortions, and unknown strings without character segmentation or dictionary dependence.

Contribution

It presents a novel deep learning framework combining CNNs and LSTMs for scene text recognition, avoiding character segmentation and enabling recognition of arbitrary and unknown words.

Findings

01

Robust recognition of highly ambiguous words.

02

Effective handling of various image distortions.

03

Recognition of unknown words without dictionary reliance.

Abstract

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

somitmittal/Reading-Scene-Text-from-Images-using-Tensorflow-CNN-Bidirectional-LSTM-CTC-Loss
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling