TL;DR
Vividh-ASR introduces a complexity-stratified benchmark for Indian languages and proposes R-MFT, a training method that enhances multilingual ASR performance while maintaining model efficiency.
Contribution
The paper presents Vividh-ASR benchmark and R-MFT training recipe, improving low-resource language speech recognition and understanding model adaptation dynamics.
Findings
Early large parameter updates improve global WER by 12 points.
A hard-to-easy curriculum adds gains for spontaneous speech.
R-MFT enables smaller models to match larger fine-tuned models.
Abstract
Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and Malayalam across four tiers: studio, broadcast, spontaneous, and synthetic noise. Through a controlled study of learning-rate timing and curriculum ordering, we find that early large parameter updates improve global WER by 12 absolute points, while a hard-to-easy curriculum adds gains for spontaneous speech. These findings motivate reverse multi-stage fine-tuning (R-MFT), a training recipe that enables a parameter-efficient 244M Whisper model to match or exceed conventionally fine-tuned 769M counterparts. Representational analysis via CKA and SVD reveals effective schedules concentrate adaptation in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗adalat-ai/whisper-medium-ml-high-lrmodel· 60 dl60 dl
- 🤗adalat-ai/whisper-small-ml-high-lrmodel· 54 dl54 dl
- 🤗adalat-ai/whisper-small-ml-rmftmodel· 60 dl60 dl
- 🤗adalat-ai/whisper-medium-ml-rmftmodel· 107 dl· ♡ 1107 dl♡ 1
- 🤗adalat-ai/whisper-medium-hi-high-lrmodel· 71 dl71 dl
- 🤗adalat-ai/whisper-medium-hi-rmftmodel· 94 dl94 dl
- 🤗adalat-ai/whisper-small-hi-high-lrmodel· 79 dl79 dl
- 🤗adalat-ai/whisper-small-hi-rmftmodel· 75 dl75 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
