Recognition of Handwritten Roman Script Using Tesseract Open source OCR   Engine

Sandip Rakshit; Subhadip Basu

arXiv:1003.5891·cs.CV·March 31, 2010·4 cites

Recognition of Handwritten Roman Script Using Tesseract Open source OCR Engine

Sandip Rakshit, Subhadip Basu

PDF

Open Access

TL;DR

This study evaluates the effectiveness of the open-source Tesseract OCR engine in recognizing handwritten lowercase Roman script, achieving an accuracy of 83.5% on user-specific samples.

Contribution

It demonstrates the application of Tesseract OCR to handwritten Roman script and provides a trained model with specific accuracy metrics.

Findings

01

Character-level accuracy of 83.5%

02

Segmentation failure rate of 5.56%

03

Misclassification rate of 10.94%

Abstract

In the present work, we have used Tesseract 2.01 open source Optical Character Recognition (OCR) Engine under Apache License 2.0 for recognition of handwriting samples of lower case Roman script. Handwritten isolated and free-flow text samples were collected from multiple users. Tesseract is trained to recognize user-specific handwriting samples of both the categories of document pages. On a single user model, the system is trained with 1844 isolated handwritten characters and the performance is tested on 1133 characters, taken form the test set. The overall character-level accuracy of the system is observed as 83.5%. The system fails to segment 5.56% characters and erroneously classifies 10.94% characters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Computer Science and Engineering