An Efficient Language-Independent Multi-Font OCR for Arabic Script

Hussein Osman; Karim Zaghw; Mostafa Hazem; Seifeldin Elsehely

arXiv:2009.09115·cs.CV·September 22, 2020·1 cites

An Efficient Language-Independent Multi-Font OCR for Arabic Script

Hussein Osman, Karim Zaghw, Mostafa Hazem, Seifeldin Elsehely

PDF

Open Access

TL;DR

This paper presents a comprehensive, language-independent OCR system for Arabic script that combines advanced segmentation and neural recognition techniques, achieving high accuracy on multiple datasets.

Contribution

It introduces an improved font-independent segmentation algorithm and a neural network-based recognition model for Arabic OCR, outperforming existing methods.

Findings

01

Character segmentation accuracy: 98.06%

02

Character recognition accuracy: 99.89%

03

Overall system accuracy: 97.94%

Abstract

Optical Character Recognition (OCR) is the process of extracting digitized text from images of scanned documents. While OCR systems have already matured in many languages, they still have shortcomings in cursive languages with overlapping letters such as the Arabic language. This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document. Our Arabic OCR system consists of the following modules: Pre-processing, Word-level Feature Extraction, Character Segmentation, Character Recognition, and Post-processing. This paper also proposes an improved font-independent character segmentation algorithm that outperforms the state-of-the-art segmentation algorithms. Lastly, the paper proposes a neural network model for the character recognition task. The system has experimented on several open Arabic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction