MVP: Multimodal Emotion Recognition based on Video and Physiological Signals
Valeriya Strizhkova, Hadi Kachmar, Hava Chaptoukaev, Raphael, Kalandadze, Natia Kukhilava, Tatia Tsmindashvili, Nibras Abo-Alzahab, Maria, A. Zuluaga, Michal Balazia, Antitza Dantcheva, Fran\c{c}ois Br\'emond, Laura, Ferrari

TL;DR
This paper introduces MVP, a deep learning architecture that effectively fuses video and physiological signals for emotion recognition, leveraging attention mechanisms to handle long input sequences and outperform existing methods.
Contribution
The paper presents MVP, a novel multimodal architecture that integrates video and physiological data using attention, enabling long sequence processing and improved emotion recognition accuracy.
Findings
MVP outperforms previous methods on facial video, EDA, and ECG/PPG data.
Attention mechanisms improve the handling of long input sequences.
The approach demonstrates superior performance in multimodal emotion recognition.
Abstract
Human emotions entail a complex set of behavioral, physiological and cognitive changes. Current state-of-the-art models fuse the behavioral and physiological components using classic machine learning, rather than recent deep learning techniques. We propose to fill this gap, designing the Multimodal for Video and Physio (MVP) architecture, streamlined to fuse video and physiological signals. Differently then others approaches, MVP exploits the benefits of attention to enable the use of long input sequences (1-2 minutes). We have studied video and physiological backbones for inputting long sequences and evaluated our method with respect to the state-of-the-art. Our results show that MVP outperforms former methods for emotion recognition based on facial videos, EDA, and ECG/PPG.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Emotion and Mood Recognition
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
