Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

Anik De; Abhirama Subramanyam Penamakuri; Rajeev Yadav; Aditya Rathore; Harshiv Shah; Devesh Sharma; Sagar Agarwal; Pravin Kumar; Anand Mishra

arXiv:2511.23071·cs.CV·April 13, 2026

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

Anik De, Abhirama Subramanyam Penamakuri, Rajeev Yadav, Aditya Rathore, Harshiv Shah, Devesh Sharma, Sagar Agarwal, Pravin Kumar, Anand Mishra

PDF

TL;DR

This paper introduces Bharat Scene Text Dataset (BSTD), a large-scale benchmark with over 100K words across 11 Indian languages and English, aimed at advancing Indian language scene text recognition.

Contribution

It provides a comprehensive, annotated dataset and benchmark for Indian language scene text recognition, filling a critical gap in resources and evaluation standards.

Findings

01

State-of-the-art models face challenges adapting to Indian scripts.

02

Fine-tuning English models improves performance on Indian languages.

03

The dataset enables multi-task scene text understanding in Indian languages.

Abstract

Reading scene text, that is, text appearing in images, has numerous application areas, including assistive technology, search, and e-commerce. Although scene text recognition in English has advanced significantly and is often considered nearly a solved problem, Indian language scene text recognition remains an open challenge. This is due to script diversity, non-standard fonts, and varying writing styles, and, more importantly, the lack of high-quality datasets and open-source models. To address these gaps, we introduce the Bharat Scene Text Dataset (BSTD) - a large-scale and comprehensive benchmark for studying Indian Language Scene Text Recognition. It comprises more than 100K words that span 11 Indian languages and English, sourced from over 6,500 scene images captured across various linguistic regions of India. The dataset is meticulously annotated and supports multiple scene text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.