Exploring In-Context Learning of Textless Speech Language Model for   Speech Classification Tasks

Ming-Hao Hsu; Kai-Wei Chang; Shang-Wen Li; Hung-yi Lee

arXiv:2310.12477·eess.AS·June 18, 2024·1 cites

Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks

Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-yi Lee

PDF

Open Access

TL;DR

This paper investigates in-context learning for textless speech language models in speech classification, demonstrating how to enable such models to perform unseen tasks without explicit retraining.

Contribution

It is the first to explore and enable in-context learning in textless speech language models for speech classification tasks.

Findings

01

Current speech LMs lack ICL capability.

02

Warmup training equips speech LMs with demonstration learning.

03

First speech LM capable of unseen classification via ICL.

Abstract

Ever since the development of GPT-3 in the natural language processing (NLP) field, in-context learning (ICL) has played an essential role in utilizing large language models (LLMs). By presenting the LM utterance-label demonstrations at the input, the LM can accomplish few-shot learning without relying on gradient descent or requiring explicit modification of its parameters. This enables the LM to perform various downstream tasks in a black-box manner. Despite the success of ICL in NLP, little work is exploring the possibility of ICL in speech processing. This study is the first work exploring ICL for speech classification tasks with textless speech LM. We first show that the current speech LM lacks the ICL capability. We then perform warmup training on the speech LM, equipping the LM with demonstration learning capability. This paper explores and proposes the first speech LM capable of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Linear Layer · Layer Normalization · Attention Dropout · Softmax · Dense Connections