New Results for the Text Recognition of Arabic Maghrib{\=i} Manuscripts -- Managing an Under-resourced Script
Lucas No\"emie, Cl\'ement Salah (SU, UNIL), Chahan Vidal-Gor\`ene, (ENC)

TL;DR
This paper introduces a new approach for developing handwritten text recognition models for Arabic Maghribi manuscripts, achieving high accuracy with minimal manual transcription, thus aiding under-resourced script processing.
Contribution
It proposes a novel word-based neural method tailored for Arabic scripts, demonstrating effective recognition with limited data, advancing digital humanities tools for under-resourced languages.
Findings
Achieved below 5% error rate with only 10 transcribed pages.
Validated the effectiveness of a word-based neural approach for Arabic.
Enhanced processing capabilities for poorly-endowed languages.
Abstract
HTR models development has become a conventional step for digital humanities projects. The performance of these models, often quite high, relies on manual transcription and numerous handwritten documents. Although the method has proven successful for Latin scripts, a similar amount of data is not yet achievable for scripts considered poorly-endowed, like Arabic scripts. In that respect, we are introducing and assessing a new modus operandi for HTR models development and fine-tuning dedicated to the Arabic Maghrib{\=i} scripts. The comparison between several state-of-the-art HTR demonstrates the relevance of a word-based neural approach specialized for Arabic, capable to achieve an error rate below 5% with only 10 pages manually transcribed. These results open new perspectives for Arabic scripts processing and more generally for poorly-endowed languages processing. This research is part…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques
