TL;DR
This paper proposes a weighted cross-entropy method to improve multilingual speech recognition for low-resource languages, achieving significant WER reductions without harming high-resource language performance.
Contribution
It introduces a novel weighted cross-entropy approach for low-resource language integration in multilingual ASR, enhancing performance in continual learning settings.
Findings
6.69% WER reduction for low-resource language
48.86% WER reduction compared to original Whisper model
Average 3.29% WER reduction across six languages
Abstract
This paper addresses the challenge of integrating low-resource languages into multilingual automatic speech recognition (ASR) systems. We introduce a novel application of weighted cross-entropy, typically used for unbalanced datasets, to facilitate the integration of low-resource languages into pre-trained multilingual ASR models within the context of continual multilingual learning. We fine-tune the Whisper multilingual ASR model on five high-resource languages and one low-resource language, employing language-weighted dynamic cross-entropy and data augmentation. The results show a remarkable 6.69% word error rate (WER) reduction for the low-resource language compared to the fine-tuned model without applying our approach, and a 48.86% WER reduction compared to the original Whisper model. In addition, our approach yields an average WER reduction of 3.29% across the six languages,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
