Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

Dianwen Ng; Chong Zhang; Ruixi Zhang; Yukun Ma; Fabian; Ritter-Gutierrez; Trung Hieu Nguyen; Chongjia Ni; Shengkui Zhao; Eng Siong; Chng; Bin Ma

arXiv:2309.09413·cs.SD·September 19, 2023

Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

Dianwen Ng, Chong Zhang, Ruixi Zhang, Yukun Ma, Fabian, Ritter-Gutierrez, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong, Chng, Bin Ma

PDF

Open Access

TL;DR

This paper investigates soft prompt tuning in speech recognition, revealing its role as a zero-shot learner that improves generalization and robustness, while also highlighting vulnerabilities and proposing enhancements for out-of-distribution noise adaptation.

Contribution

It provides a deeper understanding of soft prompts in ASR, identifying their functions and vulnerabilities, and proposes modifications to improve zero-shot noise adaptation.

Findings

01

Soft prompts act as zero-shot learners in ASR.

02

They enhance robustness against background noise.

03

Modified noise prompts enable zero-shot adaptation to new noise environments.

Abstract

Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing