End-to-End Text Recognition with Hybrid HMM Maxout Models
Ouais Alsharif, Joelle Pineau

TL;DR
This paper presents an end-to-end text recognition system for natural scenes that combines Maxout neural networks with hybrid HMM models, achieving state-of-the-art accuracy on standard benchmarks.
Contribution
It introduces a novel integration of Maxout networks and hybrid HMM models for robust scene text recognition, outperforming previous methods.
Findings
Achieved top accuracy on ICDAR 2003 dataset
Outperformed existing methods on SVT benchmark
Built a highly tunable and accurate recognition system
Abstract
The problem of detecting and recognizing text in natural scenes has proved to be more challenging than its counterpart in documents, with most of the previous work focusing on a single part of the problem. In this work, we propose new solutions to the character and word recognition problems and then show how to combine these solutions in an end-to-end text-recognition system. We do so by leveraging the recently introduced Maxout networks along with hybrid HMM models that have proven useful for voice recognition. Using these elements, we build a tunable and highly accurate recognition system that beats state-of-the-art results on all the sub-problems for both the ICDAR 2003 and SVT benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Speech Recognition and Synthesis · Natural Language Processing Techniques
MethodsMaxout
