BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment   of Continuation Writing

Chen Wang; Minpeng Liao; Zhongqiang Huang; Jinliang Lu; Junhong Wu,; Yuchen Liu; Chengqing Zong; Jiajun Zhang

arXiv:2309.00916·cs.CL·May 29, 2024·2 cites

BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing

Chen Wang, Minpeng Liao, Zhongqiang Huang, Jinliang Lu, Junhong Wu,, Yuchen Liu, Chengqing Zong, Jiajun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces BLSP, a novel approach that aligns speech and text behaviors in language models by training a modality adapter, enabling speech-related tasks without extensive speech instruction data.

Contribution

BLSP proposes a lightweight modality adapter trained via behavior alignment, bridging speech and text in LLMs without relying on large speech instruction datasets.

Findings

01

Enables speech recognition, translation, and understanding with LLMs.

02

Supports zero-shot cross-lingual speech tasks.

03

Does not require large-scale speech instruction data.

Abstract

The emergence of large language models (LLMs) has sparked significant interest in extending their remarkable language capabilities to speech. However, modality alignment between speech and text still remains an open problem. Current solutions can be categorized into two strategies. One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are used as inputs for LLMs, which limits their potential in modeling alignment between speech and text. The other is an end-to-end approach that relies on speech instruction data, which is very difficult to collect in large quantities. In this paper, we address these issues and propose the BLSP approach that Bootstraps Language-Speech Pre-training via behavior alignment of continuation writing. We achieve this by learning a lightweight modality adapter between a frozen speech encoder and an LLM,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cwang621/blsp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsAdapter