An Efficient Language-Independent Multi-Font OCR for Arabic Script
Hussein Osman, Karim Zaghw, Mostafa Hazem, Seifeldin Elsehely

TL;DR
This paper presents a comprehensive, language-independent OCR system for Arabic script that combines advanced segmentation and neural recognition techniques, achieving high accuracy on multiple datasets.
Contribution
It introduces an improved font-independent segmentation algorithm and a neural network-based recognition model for Arabic OCR, outperforming existing methods.
Findings
Character segmentation accuracy: 98.06%
Character recognition accuracy: 99.89%
Overall system accuracy: 97.94%
Abstract
Optical Character Recognition (OCR) is the process of extracting digitized text from images of scanned documents. While OCR systems have already matured in many languages, they still have shortcomings in cursive languages with overlapping letters such as the Arabic language. This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document. Our Arabic OCR system consists of the following modules: Pre-processing, Word-level Feature Extraction, Character Segmentation, Character Recognition, and Post-processing. This paper also proposes an improved font-independent character segmentation algorithm that outperforms the state-of-the-art segmentation algorithms. Lastly, the paper proposes a neural network model for the character recognition task. The system has experimented on several open Arabic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Processing and 3D Reconstruction
