HWNet v2: An Efficient Word Image Representation for Handwritten Documents
Praveen Krishnan, C.V. Jawahar

TL;DR
HWNet v2 introduces an efficient deep learning framework for handwritten word image representation, leveraging synthetic data, adapted ResNet architecture, and realistic augmentations to achieve state-of-the-art word spotting performance across multiple datasets.
Contribution
The paper proposes HWNet v2, a novel deep convolutional neural network architecture with region of interest pooling, optimized for variable-sized handwritten word images, and demonstrates its effectiveness with synthetic pre-training and data augmentation.
Findings
Achieves around 0.90 mAP on IAM dataset with 32-dimensional representation.
Outperforms previous methods on standard handwritten datasets and historical manuscripts.
Validates the framework's applicability to printed documents in multiple languages.
Abstract
We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain, and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Natural Language Processing Techniques
