Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection
Jiehui Jia, Huan Zhang, Jinhua Liang

TL;DR
This paper introduces a multimodal framework that maps human emotions into a continuous Valence-Arousal-Dominance space, enabling more nuanced emotion recognition from facial expressions, voice, and transcripts, evaluated on Chinese media data.
Contribution
It presents a novel multimodal approach that transitions from discrete to continuous emotion modeling using VAD space and K-means clustering, enhancing emotional diversity and recognition accuracy.
Findings
Effective transformation between discrete and continuous emotion models
Achieved diverse and comprehensive emotion vocabulary
Maintained high accuracy in emotion recognition
Abstract
In the domain of human-computer interaction, accurately recognizing and interpreting human emotions is crucial yet challenging due to the complexity and subtlety of emotional expressions. This study explores the potential for detecting a rich and flexible range of emotions through a multimodal approach which integrates facial expressions, voice tones, and transcript from video clips. We propose a novel framework that maps variety of emotions in a three-dimensional Valence-Arousal-Dominance (VAD) space, which could reflect the fluctuations and positivity/negativity of emotions to enable a more variety and comprehensive representation of emotional states. We employed K-means clustering to transit emotions from traditional discrete categorization to a continuous labeling system and built a classifier for emotion recognition upon this system. The effectiveness of the proposed model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
