Reading Scene Text in Deep Convolutional Sequences
Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, Xiaoou Tang

TL;DR
This paper introduces a Deep-Text Recurrent Network that treats scene text recognition as sequence labeling, effectively handling ambiguous words, distortions, and unknown strings without character segmentation or dictionary dependence.
Contribution
It presents a novel deep learning framework combining CNNs and LSTMs for scene text recognition, avoiding character segmentation and enabling recognition of arbitrary and unknown words.
Findings
Robust recognition of highly ambiguous words.
Effective handling of various image distortions.
Recognition of unknown words without dictionary reliance.
Abstract
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
