Cross-Language Learning within Arabic Script for Low-Resource HTR

Sana Al-azzawi; Elisa Barney; Marcus Liwicki

arXiv:2605.02089·cs.CV·May 5, 2026

Cross-Language Learning within Arabic Script for Low-Resource HTR

Sana Al-azzawi, Elisa Barney, Marcus Liwicki

PDF

1 Repo

TL;DR

This paper explores cross-script transfer learning for Arabic-script handwritten text recognition, demonstrating that shared characters drive transfer benefits and providing new state-of-the-art results in low-resource settings.

Contribution

It introduces a controlled study on cross-script training for Arabic-script HTR, highlighting the importance of shared characters and releasing code and data for reproducibility.

Findings

01

Cross-script transfer improves recognition accuracy in low-resource Arabic-script HTR.

02

Shared characters across scripts are key to transfer success.

03

State-of-the-art results achieved on Persian and Urdu datasets with joint training.

Abstract

Handwritten Text Recognition (HTR) under limited labeled data remains a challenging problem, particularly for Arabic-script languages. Although modern sequence-based recognizers perform well in high-resource settings, their accuracy degrades sharply as training data becomes scarce. Arabic-script languages share a common writing system with substantial character overlap, motivating cross-script training as a strategy to mitigate data scarcity. We performed experiments on Arabic, Urdu, and Persian scripts and achieved improvements over single-script baselines (new SotA especially for low-resource settings). A key finding of our experiments is that cross-script transfer is largely driven by script-level overlap rather than uniform accuracy improvements. Through a statistical character-level analysis we show that gains are structurally concentrated on characters shared across scripts, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.