TL;DR
This paper investigates the reasons behind lower scene-text recognition accuracy in non-Latin languages and proposes data augmentation strategies, including region-based font search, to significantly improve recognition performance.
Contribution
It identifies key factors affecting non-Latin text recognition accuracy and introduces region-based font augmentation to enhance deep learning models for these languages.
Findings
Improved WRRs on Arabic datasets by 24.54% and 2.32%.
Enhanced recognition rates for Devanagari datasets by 7.88% and 3.72%.
Highlighting the importance of font diversity and dataset size in recognition accuracy.
Abstract
Scene-text recognition is remarkably better in Latin languages than the non-Latin languages due to several factors like multiple fonts, simplistic vocabulary statistics, updated data generation tools, and writing systems. This paper examines the possible reasons for low accuracy by comparing English datasets with non-Latin languages. We compare various features like the size (width and height) of the word images and word length statistics. Over the last decade, generating synthetic datasets with powerful deep learning techniques has tremendously improved scene-text recognition. Several controlled experiments are performed on English, by varying the number of (i) fonts to create the synthetic data and (ii) created word images. We discover that these factors are critical for the scene-text recognition systems. The English synthetic datasets utilize over 1400 fonts while Arabic and other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
