Whose Emotion Matters? Speaking Activity Localisation without Prior   Knowledge

Hugo Carneiro; Cornelius Weber; Stefan Wermter

arXiv:2211.15377·eess.AS·August 16, 2023·1 cites

Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge

Hugo Carneiro, Cornelius Weber, Stefan Wermter

PDF

Open Access 1 Repo

TL;DR

This paper improves emotion recognition in conversations by realigning MELD videos using active speaker detection and speech recognition, enabling better facial expression analysis and outperforming vision-only models.

Contribution

It introduces MELD-FAIR, a realigned version of MELD with accurate speaker localization, and demonstrates enhanced emotion recognition performance using this data.

Findings

01

Realigned MELD-FAIR videos match transcriptions more closely.

02

Emotion recognition model trained on MELD-FAIR outperforms vision-only state-of-the-art.

03

Facial cues from localized speakers are more informative for ERC.

Abstract

The task of emotion recognition in conversations (ERC) benefits from the availability of multiple modalities, as provided, for example, in the video-based Multimodal EmotionLines Dataset (MELD). However, only a few research approaches use both acoustic and visual information from the MELD videos. There are two reasons for this: First, label-to-video alignments in MELD are noisy, making those videos an unreliable source of emotional speech data. Second, conversations can involve several people in the same scene, which requires the localisation of the utterance source. In this paper, we introduce MELD with Fixed Audiovisual Information via Realignment (MELD-FAIR) by using recent active speaker detection and automatic speech recognition models, we are able to realign the videos of MELD and capture the facial expressions from speakers in 96.92% of the utterances provided in MELD.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

knowledgetechnologyuhh/meld-fair
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis