Divide and Refine: Enhancing Multimodal Representation and Explainability for Emotion Recognition in Conversation

Anh-Tuan Mai; Cam-Van Thi Nguyen; Duc-Trong Le

arXiv:2601.14274·cs.LG·January 22, 2026

Divide and Refine: Enhancing Multimodal Representation and Explainability for Emotion Recognition in Conversation

Anh-Tuan Mai, Cam-Van Thi Nguyen, Duc-Trong Le

PDF

Open Access

TL;DR

This paper introduces a two-phase Divide and Refine framework that explicitly decomposes and enhances multimodal representations for emotion recognition in conversation, improving performance across multiple models.

Contribution

It presents a novel Divide and Refine approach that explicitly decomposes multimodal signals into unique, redundant, and synergistic components, and refines them for better emotion recognition.

Findings

01

Consistent performance improvements on IEMOCAP and MELD datasets.

02

Effective enhancement of multimodal representations across various MERC models.

03

Demonstrated the importance of explicit decomposition and refinement in multimodal learning.

Abstract

Multimodal emotion recognition in conversation (MERC) requires representations that effectively integrate signals from multiple modalities. These signals include modality-specific cues, information shared across modalities, and interactions that emerge only when modalities are combined. In information-theoretic terms, these correspond to \emph{unique}, \emph{redundant}, and \emph{synergistic} contributions. An ideal representation should leverage all three, yet achieving such balance remains challenging. Recent advances in contrastive learning and augmentation-based methods have made progress, but they often overlook the role of data preparation in preserving these components. In particular, applying augmentations directly to raw inputs or fused embeddings can blur the boundaries between modality-unique and cross-modal signals. To address this challenge, we propose a two-phase framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Social Robot Interaction and HRI