Implicit Language Model in LSTM for OCR

Ekraam Sabir; Stephen Rawls; Prem Natarajan

arXiv:1805.09441·cs.CV·May 25, 2018

Implicit Language Model in LSTM for OCR

Ekraam Sabir, Stephen Rawls, Prem Natarajan

PDF

Open Access 1 Repo

TL;DR

This paper investigates how LSTM neural networks implicitly learn language models in OCR tasks, showing they utilize up to 5 characters of context and improve recognition accuracy by 2.4% CER.

Contribution

It is the first study to characterize the implicit language model learned by LSTMs in OCR, quantifying its context size and impact on performance.

Findings

01

LSTMs learn an implicit language model in OCR.

02

The implicit LM uses up to 5 characters of context.

03

LSTM improves CER by 2.4% on synthetic test set.

Abstract

Neural networks have become the technique of choice for OCR, but many aspects of how and why they deliver superior performance are still unknown. One key difference between current neural network techniques using LSTMs and the previous state-of-the-art HMM systems is that HMM systems have a strong independence assumption. In comparison LSTMs have no explicit constraints on the amount of context that can be considered during decoding. In this paper we show that they learn an implicit LM and attempt to characterize the strength of the LM in terms of equivalent n-gram context. We show that this implicitly learned language model provides a 2.4\% CER improvement on our synthetic test set when compared against a test set of random characters (i.e. not naturally occurring sequences), and that the LSTM learns to use up to 5 characters of context (which is roughly 88 frames in our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amirabbasasadi/PersianOCR
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Handwritten Text Recognition Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory