Fusion in Context: A Multimodal Approach to Affective State Recognition
Youssef Mohamed, Severin Lemaignan, Arzu Guneysu, Patric Jensfelt and, Christian Smith

TL;DR
This paper introduces a transformer-based multimodal fusion method that combines facial thermal data, facial action units, and textual context to improve emotion recognition in human-robot interaction, emphasizing the importance of context-aware approaches.
Contribution
It presents a novel transformer-based multimodal fusion framework that integrates multiple data modalities and contextual information for enhanced affective state recognition.
Findings
Effective fusion of facial, thermal, and textual data improves emotion detection accuracy.
Context-aware multimodal approach outperforms unimodal and non-contextual methods.
Demonstrated success on a dataset from a tabletop game inducing various affective states.
Abstract
Accurate recognition of human emotions is a crucial challenge in affective computing and human-robot interaction (HRI). Emotional states play a vital role in shaping behaviors, decisions, and social interactions. However, emotional expressions can be influenced by contextual factors, leading to misinterpretations if context is not considered. Multimodal fusion, combining modalities like facial expressions, speech, and physiological signals, has shown promise in improving affect recognition. This paper proposes a transformer-based multimodal fusion approach that leverages facial thermal data, facial action units, and textual context information for context-aware emotion recognition. We explore modality-specific encoders to learn tailored representations, which are then fused using additive fusion and processed by a shared transformer encoder to capture temporal dependencies and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
