HWNet v2: An Efficient Word Image Representation for Handwritten   Documents

Praveen Krishnan; C.V. Jawahar

arXiv:1802.06194·cs.CV·March 20, 2019·5 cites

HWNet v2: An Efficient Word Image Representation for Handwritten Documents

Praveen Krishnan, C.V. Jawahar

PDF

Open Access

TL;DR

HWNet v2 introduces an efficient deep learning framework for handwritten word image representation, leveraging synthetic data, adapted ResNet architecture, and realistic augmentations to achieve state-of-the-art word spotting performance across multiple datasets.

Contribution

The paper proposes HWNet v2, a novel deep convolutional neural network architecture with region of interest pooling, optimized for variable-sized handwritten word images, and demonstrates its effectiveness with synthetic pre-training and data augmentation.

Findings

01

Achieves around 0.90 mAP on IAM dataset with 32-dimensional representation.

02

Outperforms previous methods on standard handwritten datasets and historical manuscripts.

03

Validates the framework's applicability to printed documents in multiple languages.

Abstract

We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain, and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Natural Language Processing Techniques