Seal: Advancing Speech Language Models to be Few-Shot Learners

Shuyu Lei; Lingen Liu; Jiaolong Yang; Yasen Jiao; Yuxiang Yang; Yushu; Yang; Xiang Guo

arXiv:2407.14875·cs.CL·July 23, 2024

Seal: Advancing Speech Language Models to be Few-Shot Learners

Shuyu Lei, Lingen Liu, Jiaolong Yang, Yasen Jiao, Yuxiang Yang, Yushu, Yang, Xiang Guo

PDF

Open Access

TL;DR

Seal is a novel speech language model that enhances few-shot learning capabilities in a multi-modal setting by aligning speech and language models through a specialized training method, demonstrating robustness across tasks.

Contribution

The paper introduces Seal, a multi-modal speech language model that uses a novel alignment technique to enable effective few-shot learning in speech understanding tasks.

Findings

01

Seal performs robustly as a few-shot learner on speech tasks.

02

The alignment method improves cross-modal transfer and robustness.

03

Experiments validate effectiveness across different language models.

Abstract

Existing auto-regressive language models have demonstrated a remarkable capability to perform a new task with just a few examples in prompt, without requiring any additional training. In order to extend this capability to a multi-modal setting (i.e. speech and language), this paper introduces the Seal model, an abbreviation for speech language model. It incorporates a novel alignment method, in which Kullback-Leibler divergence loss is performed to train a projector that bridges a frozen speech encoder with a frozen language model decoder. The resulting Seal model exhibits robust performance as a few-shot learner on two speech understanding tasks. Additionally, consistency experiments are conducted to validate its robustness on different pre-trained language models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques