Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
Dianwen Ng, Chong Zhang, Ruixi Zhang, Yukun Ma, Fabian, Ritter-Gutierrez, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong, Chng, Bin Ma

TL;DR
This paper investigates soft prompt tuning in speech recognition, revealing its role as a zero-shot learner that improves generalization and robustness, while also highlighting vulnerabilities and proposing enhancements for out-of-distribution noise adaptation.
Contribution
It provides a deeper understanding of soft prompts in ASR, identifying their functions and vulnerabilities, and proposes modifications to improve zero-shot noise adaptation.
Findings
Soft prompts act as zero-shot learners in ASR.
They enhance robustness against background noise.
Modified noise prompts enable zero-shot adaptation to new noise environments.
Abstract
Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
