TL;DR
This paper explores cross-script transfer learning for Arabic-script handwritten text recognition, demonstrating that shared characters drive transfer benefits and providing new state-of-the-art results in low-resource settings.
Contribution
It introduces a controlled study on cross-script training for Arabic-script HTR, highlighting the importance of shared characters and releasing code and data for reproducibility.
Findings
Cross-script transfer improves recognition accuracy in low-resource Arabic-script HTR.
Shared characters across scripts are key to transfer success.
State-of-the-art results achieved on Persian and Urdu datasets with joint training.
Abstract
Handwritten Text Recognition (HTR) under limited labeled data remains a challenging problem, particularly for Arabic-script languages. Although modern sequence-based recognizers perform well in high-resource settings, their accuracy degrades sharply as training data becomes scarce. Arabic-script languages share a common writing system with substantial character overlap, motivating cross-script training as a strategy to mitigate data scarcity. We performed experiments on Arabic, Urdu, and Persian scripts and achieved improvements over single-script baselines (new SotA especially for low-resource settings). A key finding of our experiments is that cross-script transfer is largely driven by script-level overlap rather than uniform accuracy improvements. Through a statistical character-level analysis we show that gains are structurally concentrated on characters shared across scripts, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
