Using fine-tuning and min lookahead beam search to improve Whisper

Andrea Do; Oscar Brown; Zhengjie Wang; Nikhil Mathew; Zixin Liu,; Jawwad Ahmed; Cheng Yu

arXiv:2309.10299·eess.AS·September 20, 2023

Using fine-tuning and min lookahead beam search to improve Whisper

Andrea Do, Oscar Brown, Zhengjie Wang, Nikhil Mathew, Zixin Liu,, Jawwad Ahmed, Cheng Yu

PDF

Open Access

TL;DR

This paper enhances Whisper's performance on low-resource languages by fine-tuning with LoRA and introducing Min Lookahead beam search, significantly reducing WER across multiple languages.

Contribution

It proposes an improved decoding algorithm and demonstrates that fine-tuning with LoRA and Min Lookahead significantly improves Whisper's accuracy.

Findings

01

38.49 WER reduction on Vietnamese with LoRA fine-tuning

02

2.26 WER reduction using Min Lookahead over standard beam search

03

Theoretical proof that Min Lookahead outperforms standard beam search

Abstract

The performance of Whisper in low-resource languages is still far from perfect. In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper. To address these issues, we fine-tune Whisper on additional data and propose an improved decoding algorithm. On the Vietnamese language, fine-tuning Whisper-Tiny with LoRA leads to an improvement of 38.49 in WER over the zero-shot Whisper-Tiny setting which is a further reduction of 1.45 compared to full-parameter fine-tuning. Additionally, by using Filter-Ends and Min Lookahead decoding algorithms, the WER reduces by 2.26 on average over a range of languages compared to standard beam search. These results generalise to larger Whisper model sizes. We also prove a theorem that Min Lookahead outperforms the standard beam search algorithm used in Whisper.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms

MethodsLookahead