TL;DR
This paper introduces a neural network model for full-page handwritten text recognition that can handle diverse layouts and generate auxiliary markup, achieving state-of-the-art results and outperforming commercial APIs.
Contribution
The proposed Image to Sequence neural network architecture enables full-page recognition without segmentation and supports auxiliary markup generation, advancing the field of handwritten text recognition.
Findings
Achieves state-of-the-art paragraph recognition on IAM dataset.
Outperforms commercial HTR cloud APIs on real-world handwritten tests.
Deployed successfully in a commercial web application.
Abstract
We present a Neural Network based Handwritten Text Recognition (HTR) model architecture that can be trained to recognize full pages of handwritten or printed text without image segmentation. Being based on Image to Sequence architecture, it can extract text present in an image and then sequence it correctly without imposing any constraints regarding orientation, layout and size of text and non-text. Further, it can also be trained to generate auxiliary markup related to formatting, layout and content. We use character level vocabulary, thereby enabling language and terminology of any subject. The model achieves a new state-of-art in paragraph level recognition on the IAM dataset. When evaluated on scans of real world handwritten free form test answers - beset with curved and slanted lines, drawings, tables, math, chemistry and other symbols - it performs better than all commercially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
