Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women

Sakshi Joshi; Eldho Ittan George; Tahir Javed; Kaushal Bhogale; Nikhil Narasimhan; Mitesh M. Khapra

arXiv:2506.09653·eess.AS·June 12, 2025·Interspeech

Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women

Sakshi Joshi, Eldho Ittan George, Tahir Javed, Kaushal Bhogale, Nikhil Narasimhan, Mitesh M. Khapra

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces SRUTI, a new benchmark for rural Bhojpuri women, and proposes synthetic speech generation to improve ASR performance, addressing data scarcity and promoting digital inclusion for marginalized communities.

Contribution

It creates the SRUTI benchmark for rural Bhojpuri women and develops a synthetic speech augmentation method to enhance ASR accuracy in low-resource settings.

Findings

01

Current ASR models perform poorly on SRUTI due to data scarcity.

02

Synthetic speech augmentation improves WER by 4.7 points.

03

Minimal data collection (25-30 seconds per speaker) is effective for low-resource ASR.

Abstract

Digital inclusion remains a challenge for marginalized communities, especially rural women in low-resource language regions like Bhojpuri. Voice-based access to agricultural services, financial transactions, government schemes, and healthcare is vital for their empowerment, yet existing ASR systems for this group remain largely untested. To address this gap, we create SRUTI ,a benchmark consisting of rural Bhojpuri women speakers. Evaluation of current ASR models on SRUTI shows poor performance due to data scarcity, which is difficult to overcome due to social and cultural barriers that hinder large-scale data collection. To overcome this, we propose generating synthetic speech using just 25-30 seconds of audio per speaker from approximately 100 rural women. Augmenting existing datasets with this synthetic data achieves an improvement of 4.7 WER, providing a scalable, minimally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai4bharat/sruti
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsICT in Developing Communities · Speech Recognition and Synthesis · AI in Service Interactions