Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
Jinpeng Li, Yu Pu, Qi Sun, Wei-Qiang Zhang

TL;DR
This paper enhances Whisper's speech recognition accuracy for Kazakh by leveraging unpaired data, GPT integration, and pseudo-labeling, achieving over 10% WER reduction and demonstrating potential for other low-resource languages.
Contribution
It introduces a novel approach combining unpaired data, GPT-based language modeling, and pseudo-labeling to improve recognition in under-represented languages.
Findings
Over 10% absolute WER reduction on Kazakh speech recognition.
Effective use of unpaired speech and text data for model fine-tuning.
Potential generalization to other low-resource languages.
Abstract
Whisper and other large-scale automatic speech recognition models have made significant progress in performance. However, their performance on many low-resource languages, such as Kazakh, is not satisfactory. It is worth researching how to utilize low-cost data to improve the performance of Whisper on under-represented languages. In this study, we utilized easily accessible unpaired speech and text data and combined the language model GPT with Whisper on Kazakh. We implemented end of transcript (EOT) judgment modification and hallucination penalty to improve the performance of speech recognition. Further, we employed the decoding average token log probability as a criterion to select samples from unlabeled speech data and used pseudo-labeled data to fine-tune the model to further improve its performance. Ultimately, we achieved more than 10\% absolute WER reduction in multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare · Hate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Softmax · Dense Connections · Dropout · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning
