Database of handwritten Arabic mathematical formulas images
Ibtissem Hadj Ali, Mohammed Ali Mahjoub

TL;DR
This paper introduces a new publicly available database of handwritten Arabic mathematical formulas, including images and structured ground truth, to support recognition system development and evaluation.
Contribution
It provides the first comprehensive, annotated dataset of handwritten Arabic mathematical expressions, facilitating research in this specific recognition domain.
Findings
Contains 4,238 handwritten expressions from 66 writers
Includes 20,300 isolated symbol images
Provides structured XML ground truth with MathML
Abstract
Although publicly available, ground-truthed database have proven useful for training, evaluating, and comparing recognition systems in many domains, the availability of such database for handwritten Arabic mathematical formula recognition in particular, is currently quite poor. In this paper, we present a new public database that contains mathematical expressions available in their off-line handwritten form. Here, we describe the different steps that allowed us to acquire this database, from the creation of the mathematical expression corpora to the transcription of the collected data. Currently, the dataset contains 4 238 off-line handwritten mathematical expressions written by 66 writers and 20 300 handwritten isolated symbol images. The ground truth is also provided for the handwritten expressions as XML files with the number of symbols, and the MATHML structure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Religion and Sociopolitical Dynamics in Nigeria · Mathematics, Computing, and Information Processing
