BLSP-Emo: Towards Empathetic Large Speech-Language Models

Chen Wang; Minpeng Liao; Zhongqiang Huang; Junhong Wu; Chengqing Zong,; Jiajun Zhang

arXiv:2406.03872·cs.CL·June 7, 2024

BLSP-Emo: Towards Empathetic Large Speech-Language Models

Chen Wang, Minpeng Liao, Zhongqiang Huang, Junhong Wu, Chengqing Zong,, Jiajun Zhang

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

BLSP-Emo introduces an end-to-end speech-language model that understands semantics and emotions, generating empathetic responses by leveraging existing ASR and SER datasets through a two-stage pretraining process.

Contribution

It presents a novel two-stage pretraining approach for an empathetic speech-language model using existing datasets, advancing emotional understanding in speech models.

Findings

01

BLSP-Emo effectively comprehends speech and emotions.

02

The model generates empathetic responses in conversations.

03

It outperforms baseline models in instruction-following tasks.

Abstract

The recent release of GPT-4o showcased the potential of end-to-end multimodal models, not just in terms of low latency but also in their ability to understand and generate expressive speech with rich emotions. While the details are unknown to the open research community, it likely involves significant amounts of curated data and compute, neither of which is readily accessible. In this paper, we present BLSP-Emo (Bootstrapped Language-Speech Pretraining with Emotion support), a novel approach to developing an end-to-end speech-language model capable of understanding both semantics and emotions in speech and generate empathetic responses. BLSP-Emo utilizes existing speech recognition (ASR) and speech emotion recognition (SER) datasets through a two-stage process. The first stage focuses on semantic alignment, following recent work on pretraining speech-language models using ASR data. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cwang621/blsp-emo
pytorchOfficial

Models

🤗
cwang621/blsp-emo
model· 18 dl· ♡ 3
18 dl♡ 3

Videos

BLSP-Emo: Towards Empathetic Large Speech-Language Models· underline

Taxonomy

TopicsTopic Modeling