Development of a multi-user handwriting recognition system using   Tesseract open source OCR engine

Sandip Rakshit; Subhadip Basu

arXiv:1003.5886·cs.CV·March 31, 2010·4 cites

Development of a multi-user handwriting recognition system using Tesseract open source OCR engine

Sandip Rakshit, Subhadip Basu

PDF

Open Access

TL;DR

This paper presents a multi-user handwritten Roman script recognition system using the open-source Tesseract OCR engine, trained with user-specific data to improve accuracy for individual handwriting styles.

Contribution

It introduces a method to create user-specific models in Tesseract for recognizing handwritten text, achieving notable accuracy improvements for individual users.

Findings

01

User-specific models achieved up to 87.92% accuracy.

02

Overall system accuracy was 78.39%.

03

Character segmentation failure rate was 10.96%.

Abstract

The objective of the paper is to recognize handwritten samples of lower case Roman script using Tesseract open source Optical Character Recognition (OCR) engine under Apache License 2.0. Handwritten data samples containing isolated and free-flow text were collected from different users. Tesseract is trained with user-specific data samples of both the categories of document pages to generate separate user-models representing a unique language-set. Each such language-set recognizes isolated and free-flow handwritten test samples collected from the designated user. On a three user model, the system is trained with 1844, 1535 and 1113 isolated handwritten character samples collected from three different users and the performance is tested on 1133, 1186 and 1204 character samples, collected form the test sets of the three users respectively. The user specific character level accuracies were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction