DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition
Peiyuan Jiang (School of Computer Science, Engineering, University of Electronic Science, Technology of China), Yao Liu (School of Information, Software Engineering, University of Electronic Science, Technology of China), Qiao Liu (School of Computer Science, Engineering

TL;DR
This paper introduces DRKF, a novel multimodal emotion recognition method that decouples shared and specific features, fuses modalities with attention, and preserves emotional inconsistency cues to improve accuracy.
Contribution
DRKF combines contrastive mutual information estimation, a self-attention fusion encoder, and an emotion discrimination submodule to address modality heterogeneity and inconsistency in MER.
Findings
Achieves state-of-the-art results on IEMOCAP, MELD, and M3ED datasets.
Effectively handles emotional inconsistency through the ED submodule.
Outperforms existing methods in multimodal emotion recognition.
Abstract
Multimodal emotion recognition (MER) aims to identify emotional states by integrating and analyzing information from multiple modalities. However, inherent modality heterogeneity and inconsistencies in emotional cues remain key challenges that hinder performance. To address these issues, we propose a Decoupled Representations with Knowledge Fusion (DRKF) method for MER. DRKF consists of two main modules: an Optimized Representation Learning (ORL) Module and a Knowledge Fusion (KF) Module. ORL employs a contrastive mutual information estimation method with progressive modality augmentation to decouple task-relevant shared representations and modality-specific features while mitigating modality heterogeneity. KF includes a lightweight self-attention-based Fusion Encoder (FE) that identifies the dominant modality and integrates emotional information from other modalities to enhance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
