KOHTD: Kazakh Offline Handwritten Text Dataset
Nazgul Toiganbayeva, Mahmoud Kasem, Galymzhan Abdimanap, Kairat, Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Daniyar Nurseitov

TL;DR
This paper introduces KOHTD, a large Kazakh handwritten text dataset, and explores recognition methods including CTC, attention models, and a genetic algorithm for segmentation, to advance Kazakh handwriting recognition research.
Contribution
The paper provides the first extensive Kazakh handwritten text dataset and applies multiple recognition techniques, including a novel genetic algorithm for segmentation.
Findings
KOHTD dataset contains over 140,000 segmented images and 922,000 symbols.
Recognition methods achieved promising results on Kazakh handwritten text.
Genetic Algorithm effectively segments lines and words in handwritten Kazakh documents.
Abstract
Despite the transition to digital information exchange, many documents, such as invoices, taxes, memos and questionnaires, historical data, and answers to exam questions, still require handwritten inputs. In this regard, there is a need to implement Handwritten Text Recognition (HTR) which is an automatic way to decrypt records using a computer. Handwriting recognition is challenging because of the virtually infinite number of ways a person can write the same message. For this proposal we introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary. This is particularly true given the lack of a dataset for handwritten Kazakh text. In this paper, we proposed our extensive Kazakh offline Handwritten Text dataset (KOHTD), which has 3000 handwritten exam papers and more than 140335 segmented images and there are approximately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques
MethodsBidirectional LSTM · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Convolution · Support Vector Machine · Max Pooling · R-CNN · PyTorch DDP · Genetic Algorithms
