EMERSK -- Explainable Multimodal Emotion Recognition with Situational Knowledge
Mijanur Palash, Bharat Bhargava

TL;DR
EMERSK is a modular, multimodal emotion recognition system that combines visual cues and situational knowledge to improve accuracy and provide explanations for its predictions.
Contribution
The paper introduces EMERSK, a flexible system integrating multiple modalities and situational context for explainable emotion recognition using deep learning.
Findings
Multimodal fusion improves emotion recognition accuracy.
Situational knowledge enhances explanation quality.
The system outperforms state-of-the-art methods on benchmark datasets.
Abstract
Automatic emotion recognition has recently gained significant attention due to the growing popularity of deep learning algorithms. One of the primary challenges in emotion recognition is effectively utilizing the various cues (modalities) available in the data. Another challenge is providing a proper explanation of the outcome of the learning.To address these challenges, we present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK), a generalized and modular system for human emotion recognition and explanation using visual information. Our system can handle multiple modalities, including facial expressions, posture, and gait, in a flexible and modular manner. The network consists of different modules that can be added or removed depending on the available data. We utilize a two-stream network architecture with convolutional neural networks (CNNs) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Gait Recognition and Analysis
