Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech   Model

Kai-Wei Chang; Ming-Hsin Chen; Yun-Ping Lin; Jing Neng Hsu; Paul; Kuo-Ming Huang; Chien-yu Huang; Shang-Wen Li; Hung-yi Lee

arXiv:2310.02971·eess.AS·November 16, 2023

Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model

Kai-Wei Chang, Ming-Hsin Chen, Yun-Ping Lin, Jing Neng Hsu, Paul, Kuo-Ming Huang, Chien-yu Huang, Shang-Wen Li, Hung-yi Lee

PDF

Open Access

TL;DR

This paper demonstrates that prompting and adapter tuning significantly improve performance in sequence generation and low-resource multilingual speech tasks using a self-supervised encoder-decoder model, surpassing traditional fine-tuning methods.

Contribution

It introduces the application of prompting and adapter tuning to a self-supervised encoder-decoder speech model, showing their effectiveness in sequence generation and cross-lingual tasks.

Findings

01

Prompting outperforms fine-tuning in low-resource scenarios.

02

Achieves 53% relative WER reduction in ASR.

03

Outperforms adapter tuning in low-resource settings.

Abstract

Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq in cross-lingual ASR. When limited trainable parameters are involved, prompting and adapter tuning consistently outperform conventional FT across 7…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAdapter · Focus