Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning
Hongli Yang, Yizhou Peng, Hao Huang, Sheng Li

TL;DR
This paper adapts the Whisper ASR model for code-switching scenarios using soft prompt tuning, achieving improved performance in low-resource settings while maintaining parameter efficiency and knowledge retention.
Contribution
It introduces soft prompt tuning strategies for Whisper, including full fine-tuning and frozen prompt methods, and proposes SPT4ASR, enhancing code-switching speech recognition performance.
Findings
Deep prompt tuning outperforms other SPT methods.
SPT4ASR achieves further error reduction.
Maintains parameter efficiency and prior language performance.
Abstract
Large-scale multilingual ASR models like Whisper excel in high-resource settings but face challenges in low-resource scenarios, such as rare languages and code-switching (CS), due to computational costs and catastrophic forgetting. We explore Soft Prompt Tuning (SPT), a parameter-efficient method to enhance CS ASR while preserving prior knowledge. We evaluate two strategies: (1) full fine-tuning (FFT) of both soft prompts and the entire Whisper model, demonstrating improved cross-lingual capabilities compared to traditional methods, and (2) adhering to SPT's original design by freezing model parameters and only training soft prompts. Additionally, we introduce SPT4ASR, a combination of different SPT variants. Experiments on the SEAME and ASRU2019 datasets show that deep prompt tuning is the most effective SPT approach, and our SPT4ASR methods achieve further error reductions in CS ASR,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
