Using fine-tuning and min lookahead beam search to improve Whisper
Andrea Do, Oscar Brown, Zhengjie Wang, Nikhil Mathew, Zixin Liu,, Jawwad Ahmed, Cheng Yu

TL;DR
This paper enhances Whisper's performance on low-resource languages by fine-tuning with LoRA and introducing Min Lookahead beam search, significantly reducing WER across multiple languages.
Contribution
It proposes an improved decoding algorithm and demonstrates that fine-tuning with LoRA and Min Lookahead significantly improves Whisper's accuracy.
Findings
38.49 WER reduction on Vietnamese with LoRA fine-tuning
2.26 WER reduction using Min Lookahead over standard beam search
Theoretical proof that Min Lookahead outperforms standard beam search
Abstract
The performance of Whisper in low-resource languages is still far from perfect. In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper. To address these issues, we fine-tune Whisper on additional data and propose an improved decoding algorithm. On the Vietnamese language, fine-tuning Whisper-Tiny with LoRA leads to an improvement of 38.49 in WER over the zero-shot Whisper-Tiny setting which is a further reduction of 1.45 compared to full-parameter fine-tuning. Additionally, by using Filter-Ends and Min Lookahead decoding algorithms, the WER reduces by 2.26 on average over a range of languages compared to standard beam search. These results generalise to larger Whisper model sizes. We also prove a theorem that Min Lookahead outperforms the standard beam search algorithm used in Whisper.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms
MethodsLookahead
