Understanding Cross-Language Transfer Improvements in Low-Resource HTR: The Role of Sequence Modeling
Sana Al-azzawi, Chang Liu, Nudrat Habib, Elisa Barney, Marcus Liwicki

TL;DR
This study investigates how sequence modeling, rather than shared visual features, enhances cross-language transfer in low-resource handwritten text recognition for Arabic-script languages, emphasizing the importance of contextual dependencies.
Contribution
It provides a controlled comparison showing sequence-level modeling is key to transfer improvements, beyond shared visual representations, in low-resource HTR.
Findings
CRNN models outperform CNN-only models in multi-script transfer scenarios.
Sequence-level modeling correlates with improved transfer performance.
Shared visual features alone are insufficient for effective cross-language transfer.
Abstract
Handwritten Text Recognition (HTR) for Arabic-script languages benefits from cross-language joint training under low-resource conditions, particularly when using CRNN-based models that combine convolutional encoders with sequence modeling. However, it remains unclear whether these improvements are better explained by shared visual representations or sequence-level dependencies. In this work, we conduct a controlled architectural study of line-level Arabic-script HTR, comparing CNN-only models with CTC decoding and CRNN models under identical single-script and multi-script training regimes. Experiments are performed on Arabic (KHATT), Urdu (NUST-UHWR), and Persian (PHTD) datasets under low-resource settings (K in {100, 500, 1000}). Our results show a clear divergence in transfer behavior: while CNN-only models exhibit limited or unstable improvements, CRNN models achieve better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
