Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions
Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

TL;DR
This paper introduces deformable convolutions into handwritten text recognition models, significantly improving their ability to handle geometric variations in modern and historical handwritten documents, thereby enhancing digitization efforts.
Contribution
The paper proposes two deformable convolutional architectures for HTR, demonstrating their effectiveness over traditional fixed-grid convolutions on diverse datasets.
Findings
Deformable convolutions outperform standard convolutions in HTR accuracy.
The approach is effective on both modern and historical handwritten datasets.
Experimental results show improved robustness to writing style variability.
Abstract
Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Hand Gesture Recognition Systems
