Recursive Recurrent Nets with Attention Modeling for OCR in the Wild
Chen-Yu Lee, Simon Osindero

TL;DR
This paper introduces R²AM, a novel OCR method using recursive CNNs, attention, and implicit language modeling, achieving state-of-the-art results on multiple challenging natural scene text datasets.
Contribution
The paper proposes a unified recursive recurrent neural network with attention for lexicon-free OCR, combining efficient feature extraction, implicit language modeling, and end-to-end training.
Findings
Achieved state-of-the-art accuracy on Street View Text dataset.
Validated effectiveness on IIIT5k, ICDAR, and Synth90k datasets.
Demonstrated the benefits of attention and recursive CNNs in OCR.
Abstract
We present recursive recurrent neural networks with attention modeling (RAM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction; (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams; and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
