SpeechPrompt: Prompting Speech Language Models for Speech Processing   Tasks

Kai-Wei Chang; Haibin Wu; Yu-Kai Wang; Yuan-Kuei Wu; Hua Shen,; Wei-Cheng Tseng; Iu-thing Kang; Shang-Wen Li; Hung-yi Lee

arXiv:2408.13040·eess.AS·August 26, 2024

SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen,, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

PDF

TL;DR

This paper introduces SpeechPrompt, a unified prompting framework for speech processing tasks using speech language models and quantized speech units, achieving competitive results with minimal training.

Contribution

It pioneers the use of prompting in speech language models, reformulating speech tasks into speech-to-unit generation within a unified framework.

Findings

01

Competitive performance with fine-tuning methods

02

Effective in few-shot learning scenarios

03

Versatile for classification, generation, and synthesis tasks

Abstract

Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address various downstream tasks in a unified manner. This significantly reduces the need for human labor in designing task-specific models. These advantages become even more evident as the number of tasks served by the LM scales up. Motivated by the strengths of prompting, we are the first to explore the potential of prompting speech LMs in the domain of speech processing. Recently, there has been a growing interest in converting speech into discrete units for language modeling. Our pioneer research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.