Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with   MDLSTM Attention

Th\'eodore Bluche; J\'er\^ome Louradour; Ronaldo Messina

arXiv:1604.03286·cs.CV·August 24, 2016

Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention

Th\'eodore Bluche, J\'er\^ome Louradour, Ronaldo Messina

PDF

TL;DR

This paper introduces an end-to-end handwriting recognition model that uses MDLSTM attention to transcribe entire paragraphs without prior segmentation, marking a significant step forward in automatic handwritten text processing.

Contribution

The novel approach combines multi-dimensional LSTM with attention mechanisms for paragraph-level recognition, eliminating the need for line segmentation in handwriting recognition.

Findings

01

Successful end-to-end paragraph recognition on IAM Database

02

First to demonstrate multi-line handwriting recognition without segmentation

03

Encouraging results suggest feasibility of full paragraph transcription

Abstract

We present an attention-based model for end-to-end handwriting recognition. Our system does not require any segmentation of the input paragraph. The model is inspired by the differentiable attention models presented recently for speech recognition, image captioning or translation. The main difference is the covert and overt attention, implemented as a multi-dimensional LSTM network. Our principal contribution towards handwriting recognition lies in the automatic transcription without a prior segmentation into lines, which was crucial in previous approaches. To the best of our knowledge this is the first successful attempt of end-to-end multi-line handwriting recognition. We carried out experiments on the well-known IAM Database. The results are encouraging and bring hope to perform full paragraph transcription in the near future.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory