Integrating Emotion Recognition with Speech Recognition and Speaker   Diarisation for Conversations

Wen Wu; Chao Zhang; Philip C. Woodland

arXiv:2308.07145·eess.AS·August 15, 2023

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

Wen Wu, Chao Zhang, Philip C. Woodland

PDF

1 Repo

TL;DR

This paper presents a novel integrated system that combines emotion recognition, speech recognition, and speaker diarisation to improve dialogue analysis by jointly training these tasks.

Contribution

It introduces a multi-task learning framework with shared encoder and distinct output layers for AER, ASR, VAD, and speaker classification, enhancing performance over separate models.

Findings

01

Outperforms baseline systems on IEMOCAP dataset

02

Achieves better emotion recognition accuracy with automatic segmentation

03

Improves speaker classification and transcription quality

Abstract

Although automatic emotion recognition (AER) has recently drawn significant research interest, most current AER studies use manually segmented utterances, which are usually unavailable for dialogue systems. This paper proposes integrating AER with automatic speech recognition (ASR) and speaker diarisation (SD) in a jointly-trained system. Distinct output layers are built for four sub-tasks including AER, ASR, voice activity detection and speaker classification based on a shared encoder. Taking the audio of a conversation as input, the integrated system finds all speech segments and transcribes the corresponding emotion classes, word sequences, and speaker identities. Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors. Results on the IEMOCAP dataset show that the proposed system consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

w-wu/steer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.