Arabic Handwritten Text Line Dataset
Hakim Bouchal, Ahror Belaid

TL;DR
This paper introduces a new annotated dataset for historical Arabic script that includes word-level position annotations, addressing a gap in existing resources for improving recognition systems.
Contribution
The paper provides the first dataset with word-level annotations for historical Arabic texts, enhancing resources for segmentation and recognition tasks.
Findings
Dataset enables better segmentation accuracy
Facilitates development of recognition systems for Arabic
Supports research in historical Arabic script analysis
Abstract
Segmentation of Arabic manuscripts into lines of text and words is an important step to make recognition systems more efficient and accurate. The problem of segmentation into text lines is solved since there are carefully annotated dataset dedicated to this task. However, To the best of our knowledge, there are no dataset annotating the word position of Arabic texts. In this paper, we present a new dataset specifically designed for historical Arabic script in which we annotate position in word level.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Natural Language Processing Techniques
