Hybrid Mutimodal Fusion for Dimensional Emotion Recognition
Ziyu Ma, Fuyan Ma, Bin Sun, Shutao Li

TL;DR
This paper presents a multimodal fusion approach using LSTM with self-attention and late fusion strategies for continuous emotion and physiological arousal recognition, achieving top-ranked results in the MuSe 2021 challenges.
Contribution
It introduces a novel multimodal fusion framework combining LSTM, self-attention, and late fusion for emotion and physiological arousal recognition in stress scenarios.
Findings
Achieved top 3 ranking in MuSe-Stress and MuSe-Physio challenges.
Attained CCC of 0.6159 for valence and 0.4609 for arousal.
Achieved CCC of 0.5412 for physiological arousal.
Abstract
In this paper, we extensively present our solutions for the MuSe-Stress sub-challenge and the MuSe-Physio sub-challenge of Multimodal Sentiment Challenge (MuSe) 2021. The goal of MuSe-Stress sub-challenge is to predict the level of emotional arousal and valence in a time-continuous manner from audio-visual recordings and the goal of MuSe-Physio sub-challenge is to predict the level of psycho-physiological arousal from a) human annotations fused with b) galvanic skin response (also known as Electrodermal Activity (EDA)) signals from the stressed people. The Ulm-TSST dataset which is a novel subset of the audio-visual textual Ulm-Trier Social Stress dataset that features German speakers in a Trier Social Stress Test (TSST) induced stress situation is used in both sub-challenges. For the MuSe-Stress sub-challenge, we highlight our solutions in three aspects: 1) the audio-visual features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Music and Audio Processing
MethodsTest · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
