SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen,, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

TL;DR
This paper introduces SpeechPrompt, a unified prompting framework for speech processing tasks using speech language models and quantized speech units, achieving competitive results with minimal training.
Contribution
It pioneers the use of prompting in speech language models, reformulating speech tasks into speech-to-unit generation within a unified framework.
Findings
Competitive performance with fine-tuning methods
Effective in few-shot learning scenarios
Versatile for classification, generation, and synthesis tasks
Abstract
Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address various downstream tasks in a unified manner. This significantly reduces the need for human labor in designing task-specific models. These advantages become even more evident as the number of tasks served by the LM scales up. Motivated by the strengths of prompting, we are the first to explore the potential of prompting speech LMs in the domain of speech processing. Recently, there has been a growing interest in converting speech into discrete units for language modeling. Our pioneer research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
