Implicit Language Model in LSTM for OCR
Ekraam Sabir, Stephen Rawls, Prem Natarajan

TL;DR
This paper investigates how LSTM neural networks implicitly learn language models in OCR tasks, showing they utilize up to 5 characters of context and improve recognition accuracy by 2.4% CER.
Contribution
It is the first study to characterize the implicit language model learned by LSTMs in OCR, quantifying its context size and impact on performance.
Findings
LSTMs learn an implicit language model in OCR.
The implicit LM uses up to 5 characters of context.
LSTM improves CER by 2.4% on synthetic test set.
Abstract
Neural networks have become the technique of choice for OCR, but many aspects of how and why they deliver superior performance are still unknown. One key difference between current neural network techniques using LSTMs and the previous state-of-the-art HMM systems is that HMM systems have a strong independence assumption. In comparison LSTMs have no explicit constraints on the amount of context that can be considered during decoding. In this paper we show that they learn an implicit LM and attempt to characterize the strength of the LM in terms of equivalent n-gram context. We show that this implicitly learned language model provides a 2.4\% CER improvement on our synthetic test set when compared against a test set of random characters (i.e. not naturally occurring sequences), and that the LSTM learns to use up to 5 characters of context (which is roughly 88 frames in our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
