E-chat: Emotion-sensitive Spoken Dialogue System with Large Language   Models

Hongfei Xue; Yuhao Liang; Bingshen Mu; Shiliang Zhang; Mengzhe Chen,; Qian Chen; Lei Xie

arXiv:2401.00475·cs.SD·July 30, 2024·1 cites

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen,, Qian Chen, Lei Xie

PDF

Open Access

TL;DR

E-chat is a novel spoken dialogue system that uses emotion embeddings and large language models to understand and respond to emotional speech, improving emotional comprehension in human-machine interactions.

Contribution

The paper introduces E-chat, a new emotion-sensitive dialogue system that integrates speech emotion embeddings with LLMs and presents the E-chat200 dataset for emotion-aware dialogue research.

Findings

01

E-chat outperforms baseline models in emotional comprehension tasks.

02

The system effectively responds to different emotional contexts.

03

E-chat200 dataset facilitates emotion-sensitive dialogue research.

Abstract

This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Emotion and Mood Recognition · Speech Recognition and Synthesis