On-Device Language Identification of Text in Images using Diacritic Characters
Shubham Vatsal, Nikhil Arora, Gopi Ramena, Sukumar Moharana, Dhruval, Jain, Naresh Purre, Rachit S Munjal

TL;DR
This paper presents a method for on-device language identification from images using diacritic characters, enhancing OCR accuracy for Latin languages while considering mobile constraints.
Contribution
The work introduces a lightweight on-device approach for language detection based on diacritics, improving OCR performance across 13 Latin languages.
Findings
Achieved high accuracy in language identification using diacritics.
Improved OCR results when integrating language detection.
Designed a model suitable for mobile devices with low latency.
Abstract
Diacritic characters can be considered as a unique set of characters providing us with adequate and significant clue in identifying a given language with considerably high accuracy. Diacritics, though associated with phonetics often serve as a distinguishing feature for many languages especially the ones with a Latin script. In this proposed work, we aim to identify language of text in images using the presence of diacritic characters in order to improve Optical Character Recognition (OCR) performance in any given automated environment. We showcase our work across 13 Latin languages encompassing 85 diacritic characters. We use an architecture similar to Squeezedet for object detection of diacritic characters followed by a shallow network to finally identify the language. OCR systems when accompanied with identified language parameter tends to produce better results than sole deployment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
