Leveraging machine learning for less developed languages: Progress on Urdu text detection
Hazrat Ali

TL;DR
This paper introduces a new dataset and machine learning approach for detecting Urdu text in natural scene images, addressing the lack of resources and improving detection accuracy.
Contribution
It presents a novel dataset for Urdu text detection and a multi-stage SVM-based method that enhances detection performance in complex scene images.
Findings
Developed a publicly available Urdu scene image dataset.
Implemented a multi-stage SVM classifier with HoG features.
Achieved improved accuracy in Urdu text detection.
Abstract
Text detection in natural scene images has applications for autonomous driving, navigation help for elderly and blind people. However, the research on Urdu text detection is usually hindered by lack of data resources. We have developed a dataset of scene images with Urdu text. We present the use of machine learning methods to perform detection of Urdu text from the scene images. We extract text regions using channel enhanced Maximally Stable Extremal Region (MSER) method. First, we classify text and noise based on their geometric properties. Next, we use a support vector machine for early discarding of non-text regions. To further remove the non-text regions, we use histogram of oriented gradients (HoG) features obtained and train a second SVM classifier. This improves the overall performance on text region detection within the scene images. To support research on Urdu text, We aim to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Retrieval and Classification Techniques
MethodsSupport Vector Machine
