CALText: Contextual Attention Localization for Offline Handwritten Text
Tayaba Anjum, Nazar Khan

TL;DR
This paper introduces CALText, an attention-based model with a novel localization penalty for recognizing offline handwritten Urdu and Arabic scripts, improving accuracy over existing methods.
Contribution
It proposes a new attention localization technique and refines a dataset, advancing offline handwritten Urdu text recognition.
Findings
Outperforms simple attention models
Outperforms multi-directional LSTM models
Effective on Urdu and Arabic datasets
Abstract
Recognition of Arabic-like scripts such as Persian and Urdu is more challenging than Latin-based scripts. This is due to the presence of a two-dimensional structure, context-dependent character shapes, spaces and overlaps, and placement of diacritics. Not much research exists for offline handwritten Urdu script which is the 10th most spoken language in the world. We present an attention based encoder-decoder model that learns to read Urdu in context. A novel localization penalty is introduced to encourage the model to attend only one location at a time when recognizing the next character. In addition, we comprehensively refine the only complete and publicly available handwritten Urdu dataset in terms of ground-truth annotations. We evaluate the model on both Urdu and Arabic datasets and show that contextual attention localization outperforms both simple attention and multi-directional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
