MVP: Multimodal Emotion Recognition based on Video and Physiological   Signals

Valeriya Strizhkova; Hadi Kachmar; Hava Chaptoukaev; Raphael; Kalandadze; Natia Kukhilava; Tatia Tsmindashvili; Nibras Abo-Alzahab; Maria; A. Zuluaga; Michal Balazia; Antitza Dantcheva; Fran\c{c}ois Br\'emond; Laura; Ferrari

arXiv:2501.03103·cs.CV·January 7, 2025

MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

Valeriya Strizhkova, Hadi Kachmar, Hava Chaptoukaev, Raphael, Kalandadze, Natia Kukhilava, Tatia Tsmindashvili, Nibras Abo-Alzahab, Maria, A. Zuluaga, Michal Balazia, Antitza Dantcheva, Fran\c{c}ois Br\'emond, Laura, Ferrari

PDF

Open Access

TL;DR

This paper introduces MVP, a deep learning architecture that effectively fuses video and physiological signals for emotion recognition, leveraging attention mechanisms to handle long input sequences and outperform existing methods.

Contribution

The paper presents MVP, a novel multimodal architecture that integrates video and physiological data using attention, enabling long sequence processing and improved emotion recognition accuracy.

Findings

01

MVP outperforms previous methods on facial video, EDA, and ECG/PPG data.

02

Attention mechanisms improve the handling of long input sequences.

03

The approach demonstrates superior performance in multimodal emotion recognition.

Abstract

Human emotions entail a complex set of behavioral, physiological and cognitive changes. Current state-of-the-art models fuse the behavioral and physiological components using classic machine learning, rather than recent deep learning techniques. We propose to fill this gap, designing the Multimodal for Video and Physio (MVP) architecture, streamlined to fuse video and physiological signals. Differently then others approaches, MVP exploits the benefits of attention to enable the use of long input sequences (1-2 minutes). We have studied video and physiological backbones for inputting long sequences and evaluated our method with respect to the state-of-the-art. Our results show that MVP outperforms former methods for emotion recognition based on facial videos, EDA, and ECG/PPG.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Emotion and Mood Recognition

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training