DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition

Peiyuan Jiang (School of Computer Science; Engineering; University of Electronic Science; Technology of China); Yao Liu (School of Information; Software Engineering; University of Electronic Science; Technology of China); Qiao Liu (School of Computer Science; Engineering; University of Electronic Science; Technology of China); Zongshun Zhang (School of Computer Science; Engineering; University of Electronic Science; Technology of China); Jiaye Yang (School of Computer Science; Engineering; University of Electronic Science; Technology of China); Lu Liu (School of Computer Science; Engineering; University of Electronic Science; Technology of China); Daibing Yao (Yizhou Prison; Sichuan Province)

arXiv:2508.01644·cs.MM·August 5, 2025

DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition

Peiyuan Jiang (School of Computer Science, Engineering, University of Electronic Science, Technology of China), Yao Liu (School of Information, Software Engineering, University of Electronic Science, Technology of China), Qiao Liu (School of Computer Science, Engineering

PDF

TL;DR

This paper introduces DRKF, a novel multimodal emotion recognition method that decouples shared and specific features, fuses modalities with attention, and preserves emotional inconsistency cues to improve accuracy.

Contribution

DRKF combines contrastive mutual information estimation, a self-attention fusion encoder, and an emotion discrimination submodule to address modality heterogeneity and inconsistency in MER.

Findings

01

Achieves state-of-the-art results on IEMOCAP, MELD, and M3ED datasets.

02

Effectively handles emotional inconsistency through the ED submodule.

03

Outperforms existing methods in multimodal emotion recognition.

Abstract

Multimodal emotion recognition (MER) aims to identify emotional states by integrating and analyzing information from multiple modalities. However, inherent modality heterogeneity and inconsistencies in emotional cues remain key challenges that hinder performance. To address these issues, we propose a Decoupled Representations with Knowledge Fusion (DRKF) method for MER. DRKF consists of two main modules: an Optimized Representation Learning (ORL) Module and a Knowledge Fusion (KF) Module. ORL employs a contrastive mutual information estimation method with progressive modality augmentation to decouple task-relevant shared representations and modality-specific features while mitigating modality heterogeneity. KF includes a lightweight self-attention-based Fusion Encoder (FE) that identifies the dominant modality and integrates emotional information from other modalities to enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.