Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women
Sakshi Joshi, Eldho Ittan George, Tahir Javed, Kaushal Bhogale, Nikhil Narasimhan, Mitesh M. Khapra

TL;DR
This paper introduces SRUTI, a new benchmark for rural Bhojpuri women, and proposes synthetic speech generation to improve ASR performance, addressing data scarcity and promoting digital inclusion for marginalized communities.
Contribution
It creates the SRUTI benchmark for rural Bhojpuri women and develops a synthetic speech augmentation method to enhance ASR accuracy in low-resource settings.
Findings
Current ASR models perform poorly on SRUTI due to data scarcity.
Synthetic speech augmentation improves WER by 4.7 points.
Minimal data collection (25-30 seconds per speaker) is effective for low-resource ASR.
Abstract
Digital inclusion remains a challenge for marginalized communities, especially rural women in low-resource language regions like Bhojpuri. Voice-based access to agricultural services, financial transactions, government schemes, and healthcare is vital for their empowerment, yet existing ASR systems for this group remain largely untested. To address this gap, we create SRUTI ,a benchmark consisting of rural Bhojpuri women speakers. Evaluation of current ASR models on SRUTI shows poor performance due to data scarcity, which is difficult to overcome due to social and cultural barriers that hinder large-scale data collection. To overcome this, we propose generating synthetic speech using just 25-30 seconds of audio per speaker from approximately 100 rural women. Augmenting existing datasets with this synthetic data achieves an improvement of 4.7 WER, providing a scalable, minimally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Speech Recognition and Synthesis · AI in Service Interactions
