Leveraging machine learning for less developed languages: Progress on   Urdu text detection

Hazrat Ali

arXiv:2209.14022·cs.CV·September 29, 2022

Leveraging machine learning for less developed languages: Progress on Urdu text detection

Hazrat Ali

PDF

Open Access

TL;DR

This paper introduces a new dataset and machine learning approach for detecting Urdu text in natural scene images, addressing the lack of resources and improving detection accuracy.

Contribution

It presents a novel dataset for Urdu text detection and a multi-stage SVM-based method that enhances detection performance in complex scene images.

Findings

01

Developed a publicly available Urdu scene image dataset.

02

Implemented a multi-stage SVM classifier with HoG features.

03

Achieved improved accuracy in Urdu text detection.

Abstract

Text detection in natural scene images has applications for autonomous driving, navigation help for elderly and blind people. However, the research on Urdu text detection is usually hindered by lack of data resources. We have developed a dataset of scene images with Urdu text. We present the use of machine learning methods to perform detection of Urdu text from the scene images. We extract text regions using channel enhanced Maximally Stable Extremal Region (MSER) method. First, we classify text and noise based on their geometric properties. Next, we use a support vector machine for early discarding of non-text regions. To further remove the non-text regions, we use histogram of oriented gradients (HoG) features obtained and train a second SVM classifier. This improves the overall performance on text region detection within the scene images. To support research on Urdu text, We aim to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Image Retrieval and Classification Techniques

MethodsSupport Vector Machine