On the Generalization of Handwritten Text Recognition Models

Carlos Garrido-Munoz; Jorge Calvo-Zaragoza

arXiv:2411.17332·cs.LG·June 3, 2025

On the Generalization of Handwritten Text Recognition Models

Carlos Garrido-Munoz, Jorge Calvo-Zaragoza

PDF

Open Access

TL;DR

This paper investigates how handwritten text recognition models perform on out-of-distribution data, revealing key factors affecting generalization and providing insights for future improvements in real-world applications.

Contribution

It analyzes the limitations of HTR models in domain generalization without prior access to OOD data, highlighting the impact of textual and visual divergence.

Findings

01

Textual divergence is the most significant factor for generalization.

02

Model errors in OOD scenarios can be estimated with less than 10 points discrepancy in 70% cases.

03

Synthetic data usage influences HTR model generalization.

Abstract

Recent advances in Handwritten Text Recognition (HTR) have led to significant reductions in transcription errors on standard benchmarks under the i.i.d. assumption, thus focusing on minimizing in-distribution (ID) errors. However, this assumption does not hold in real-world applications, which has motivated HTR research to explore Transfer Learning and Domain Adaptation techniques. In this work, we investigate the unaddressed limitations of HTR models in generalizing to out-of-distribution (OOD) data. We adopt the challenging setting of Domain Generalization, where models are expected to generalize to OOD data without any prior access. To this end, we analyze 336 OOD cases from eight state-of-the-art HTR models across seven widely used datasets, spanning five languages. Additionally, we study how HTR models leverage synthetic data to generalize. We reveal that the most significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques

MethodsADaptive gradient method with the OPTimal convergence rate