Deep Structured Output Learning for Unconstrained Text Recognition
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

TL;DR
This paper introduces a CNN-CRF based model for unconstrained text recognition in natural images, capable of handling variable length and no fixed lexicon, trained solely on synthetic data, and achieving state-of-the-art results.
Contribution
The paper presents a novel joint CNN-CRF architecture for unconstrained text recognition that integrates character and N-gram predictions, trained end-to-end with synthetic data.
Findings
Outperforms character-only models on real-world benchmarks.
Achieves state-of-the-art accuracy in lexicon-constrained scenarios.
Effective on random alphanumeric strings without language models.
Abstract
We develop a representation suitable for the unconstrained recognition of words in natural images: the general case of no fixed lexicon and unknown length. To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN that predicts characters at each position of the output, while higher order terms are provided by another CNN that detects the presence of N-grams. We show that this entire model (CRF, character predictor, N-gram predictor) can be jointly optimised by back-propagating the structured output loss, essentially requiring the system to perform multi-task learning, and training uses purely synthetically generated data. The resulting model is a more accurate system on standard real-world text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Natural Language Processing Techniques
