LAHAJA: A Robust Multi-accent Benchmark for Evaluating Hindi ASR Systems

Tahir Javed; Janki Nawale; Sakshi Joshi; Eldho George; Kaushal; Bhogale; Deovrat Mehendale; Mitesh M. Khapra

arXiv:2408.11440·cs.CL·August 22, 2024

LAHAJA: A Robust Multi-accent Benchmark for Evaluating Hindi ASR Systems

Tahir Javed, Janki Nawale, Sakshi Joshi, Eldho George, Kaushal, Bhogale, Deovrat Mehendale, Mitesh M. Khapra

PDF

Open Access 1 Repo 3 Datasets

TL;DR

This paper introduces LAHAJA, a comprehensive Hindi ASR benchmark with diverse accents, and demonstrates that multilingual training improves model robustness, highlighting challenges in recognizing regional accents and specialized vocabulary.

Contribution

The creation of LAHAJA, a large multi-accent Hindi speech benchmark, and the evaluation of models showing the benefits of multilingual training for accent robustness.

Findings

01

Existing models perform poorly on LAHAJA.

02

Multilingual training improves ASR performance.

03

Performance drops for North-East and South Indian speakers.

Abstract

Hindi, one of the most spoken language of India, exhibits a diverse array of accents due to its usage among individuals from diverse linguistic origins. To enable a robust evaluation of Hindi ASR systems on multiple accents, we create a benchmark, LAHAJA, which contains read and extempore speech on a diverse set of topics and use cases, with a total of 12.5 hours of Hindi audio, sourced from 132 speakers spanning 83 districts of India. We evaluate existing open-source and commercial models on LAHAJA and find their performance to be poor. We then train models using different datasets and find that our model trained on multilingual data with good speaker diversity outperforms existing models by a significant margin. We also present a fine-grained analysis which shows that the performance declines for speakers from North-East and South India, especially with content heavy in named entities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai4bharat/lahaja
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing

MethodsSparse Evolutionary Training