AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in   Group Conversations

Naresh Kumar Devulapally; Sidharth Anand; Sreyasee Das Bhattacharjee,; Junsong Yuan; Yu-Ping Chang

arXiv:2401.15164·cs.SD·January 30, 2024·1 cites

AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations

Naresh Kumar Devulapally, Sidharth Anand, Sreyasee Das Bhattacharjee,, Junsong Yuan, Yu-Ping Chang

PDF

Open Access

TL;DR

AMuSE introduces an adaptive multimodal analysis framework that effectively captures cross-modal interactions and contextual cues to improve speaker emotion recognition in group conversations, with enhanced interpretability.

Contribution

The paper proposes a novel Multimodal Attention Network with adaptive fusion and explainability modules for improved emotion recognition in complex group dialogue settings.

Findings

01

3-5% improvement in Weighted-F1 score

02

5-7% improvement in accuracy

03

Enhanced interpretability of emotion predictions

Abstract

Analyzing individual emotions during group conversation is crucial in developing intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make the task of emotion recognition very challenging. This difficulty is compounded in group settings, where the emotion and its temporal evolution are not only influenced by the individual but also by external contexts like audience reaction and context of the ongoing conversation. To meet this challenge, we propose a Multimodal Attention Network that captures cross-modal interactions at various levels of spatial abstraction by jointly learning its interactive bunch of mode-specific Peripheral and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems