Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

Kaushal Bhogale; Manas Dhir; Amritansh Walecha; Manmeet Kaur; Vanshika Chhabra; Aaditya Pareek; Hanuman Sidh; Sagar Jain; Bhaskar Singh; Utkarsh Singh; Tahir Javed; Shobhit Banga; Mitesh M. Khapra

arXiv:2604.19151·cs.CL·April 22, 2026

Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

Kaushal Bhogale, Manas Dhir, Amritansh Walecha, Manmeet Kaur, Vanshika Chhabra, Aaditya Pareek, Hanuman Sidh, Sagar Jain, Bhaskar Singh, Utkarsh Singh, Tahir Javed, Shobhit Banga, Mitesh M. Khapra

PDF

TL;DR

Voice of India is a comprehensive, real-world speech recognition benchmark for 15 Indian languages, capturing natural telephonic conversations and analyzing diverse factors affecting ASR performance.

Contribution

It introduces a large-scale, unscripted Indian speech dataset with spelling variation handling and geographic performance analysis, addressing limitations of prior benchmarks.

Findings

01

Disparities in ASR performance across districts

02

Performance affected by audio quality, speaking rate, gender, device

03

Dataset includes 536 hours of speech from 36,691 speakers

Abstract

Existing Indic ASR benchmarks often use scripted, clean speech and leaderboard driven evaluation that encourages dataset specific overfitting. In addition, strict single reference WER penalizes natural spelling variation in Indian languages, including non standardized spellings of code-mixed English origin words. To address these limitations, we introduce Voice of India, a closed source benchmark built from unscripted telephonic conversations covering 15 major Indian languages across 139 regional clusters. The dataset contains 306230 utterances, totaling 536 hours of speech from 36691 speakers with transcripts accounting for spelling variations. We also analyze performance geographically at the district level, revealing disparities. Finally, we provide detailed analysis across factors such as audio quality, speaking rate, gender, and device type, highlighting where current ASR systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.