Development of a multi-user handwriting recognition system using Tesseract open source OCR engine
Sandip Rakshit, Subhadip Basu

TL;DR
This paper presents a multi-user handwritten Roman script recognition system using the open-source Tesseract OCR engine, trained with user-specific data to improve accuracy for individual handwriting styles.
Contribution
It introduces a method to create user-specific models in Tesseract for recognizing handwritten text, achieving notable accuracy improvements for individual users.
Findings
User-specific models achieved up to 87.92% accuracy.
Overall system accuracy was 78.39%.
Character segmentation failure rate was 10.96%.
Abstract
The objective of the paper is to recognize handwritten samples of lower case Roman script using Tesseract open source Optical Character Recognition (OCR) engine under Apache License 2.0. Handwritten data samples containing isolated and free-flow text were collected from different users. Tesseract is trained with user-specific data samples of both the categories of document pages to generate separate user-models representing a unique language-set. Each such language-set recognizes isolated and free-flow handwritten test samples collected from the designated user. On a three user model, the system is trained with 1844, 1535 and 1113 isolated handwritten character samples collected from three different users and the performance is tested on 1133, 1186 and 1204 character samples, collected form the test sets of the three users respectively. The user specific character level accuracies were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction
