Khayyam Offline Persian Handwriting Dataset
Pourya Jafarzadeh, Padideh Choobdar, Vahid Mohammadi Safarzadeh

TL;DR
The Khayyam dataset provides a large, diverse collection of Persian handwriting samples, including rare words, to facilitate machine learning research in Persian handwriting recognition.
Contribution
It introduces a comprehensive, publicly available Persian handwriting dataset with extensive samples, including rare words, for improved recognition research.
Findings
Machine learning algorithms trained on the dataset achieve promising results.
The dataset covers words, letters, and digits with samples from 400 writers.
It demonstrates the dataset's applicability for training and evaluating handwriting recognition models.
Abstract
Handwriting analysis is still an important application in machine learning. A basic requirement for any handwriting recognition application is the availability of comprehensive datasets. Standard labelled datasets play a significant role in training and evaluating learning algorithms. In this paper, we present the Khayyam dataset as another large unconstrained handwriting dataset for elements (words, sentences, letters, digits) of the Persian language. We intentionally concentrated on collecting Persian word samples which are rare in the currently available datasets. Khayyam's dataset contains 44000 words, 60000 letters, and 6000 digits. Moreover, the forms were filled out by 400 native Persian writers. To show the applicability of the dataset, machine learning algorithms are trained on the digits, letters, and word data and results are reported. This dataset is available for research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Natural Language Processing Techniques
