Arabic Handwritten Text Line Dataset

Hakim Bouchal; Ahror Belaid

arXiv:2312.07573·cs.CL·December 14, 2023·1 cites

Arabic Handwritten Text Line Dataset

Hakim Bouchal, Ahror Belaid

PDF

Open Access

TL;DR

This paper introduces a new annotated dataset for historical Arabic script that includes word-level position annotations, addressing a gap in existing resources for improving recognition systems.

Contribution

The paper provides the first dataset with word-level annotations for historical Arabic texts, enhancing resources for segmentation and recognition tasks.

Findings

01

Dataset enables better segmentation accuracy

02

Facilitates development of recognition systems for Arabic

03

Supports research in historical Arabic script analysis

Abstract

Segmentation of Arabic manuscripts into lines of text and words is an important step to make recognition systems more efficient and accurate. The problem of segmentation into text lines is solved since there are carefully annotated dataset dedicated to this task. However, To the best of our knowledge, there are no dataset annotating the word position of Arabic texts. In this paper, we present a new dataset specifically designed for historical Arabic script in which we annotate position in word level.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Natural Language Processing Techniques