HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis
Mohammed Hamdan, Abderrahmane Rahiche, Mohamed Cheriet

TL;DR
HAND is an innovative end-to-end model that combines hierarchical attention and multi-scale processing to improve handwritten document recognition and layout analysis, especially for complex and ancient manuscripts.
Contribution
The paper introduces HAND, a novel segmentation-free architecture that integrates advanced convolutional encoders, multi-scale adaptive processing, and hierarchical attention for simultaneous recognition and layout analysis.
Findings
Achieves up to 59.8% reduction in CER for line recognition.
Reduces page-level CER by 31.2%.
Sets new benchmarks in handwritten document recognition and layout analysis.
Abstract
Handwritten document recognition (HDR) is one of the most challenging tasks in the field of computer vision, due to the various writing styles and complex layouts inherent in handwritten texts. Traditionally, this problem has been approached as two separate tasks, handwritten text recognition and layout analysis, and struggled to integrate the two processes effectively. This paper introduces HAND (Hierarchical Attention Network for Multi-Scale Document), a novel end-to-end and segmentation-free architecture for simultaneous text recognition and layout analysis tasks. Our model's key components include an advanced convolutional encoder integrating Gated Depth-wise Separable and Octave Convolutions for robust feature extraction, a Multi-Scale Adaptive Processing (MSAP) framework that dynamically adjusts to document complexity and a hierarchical attention decoder with memory-augmented and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Vehicle License Plate Recognition
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Linear Layer · SentencePiece · Dropout · Softmax · Attention Is All You Need · Dense Connections · Inverse Square Root Schedule
