A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren,, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao Yang

TL;DR
This paper presents KWS-Whisper, a multitask training method that enhances Whisper ASR with open-vocabulary keyword spotting to better recognize rare named entities and improve overall accuracy.
Contribution
It introduces a novel multitask training approach that integrates open-vocabulary keyword spotting into Whisper, improving recognition of rare entities and enabling plug-and-play enhancements.
Findings
Significant improvement in entity recall on Chinese Aishell and internal datasets.
OV-KWS module enhances error correction in Whisper models.
Multitask training effectively combines OV-KWS and ASR tasks.
Abstract
The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
