Recognition of Handwritten Roman Script Using Tesseract Open source OCR Engine
Sandip Rakshit, Subhadip Basu

TL;DR
This study evaluates the effectiveness of the open-source Tesseract OCR engine in recognizing handwritten lowercase Roman script, achieving an accuracy of 83.5% on user-specific samples.
Contribution
It demonstrates the application of Tesseract OCR to handwritten Roman script and provides a trained model with specific accuracy metrics.
Findings
Character-level accuracy of 83.5%
Segmentation failure rate of 5.56%
Misclassification rate of 10.94%
Abstract
In the present work, we have used Tesseract 2.01 open source Optical Character Recognition (OCR) Engine under Apache License 2.0 for recognition of handwriting samples of lower case Roman script. Handwritten isolated and free-flow text samples were collected from multiple users. Tesseract is trained to recognize user-specific handwriting samples of both the categories of document pages. On a single user model, the system is trained with 1844 isolated handwritten characters and the performance is tested on 1133 characters, taken form the test set. The overall character-level accuracy of the system is observed as 83.5%. The system fails to segment 5.56% characters and erroneously classifies 10.94% characters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Computer Science and Engineering
