Divide and Refine: Enhancing Multimodal Representation and Explainability for Emotion Recognition in Conversation
Anh-Tuan Mai, Cam-Van Thi Nguyen, Duc-Trong Le

TL;DR
This paper introduces a two-phase Divide and Refine framework that explicitly decomposes and enhances multimodal representations for emotion recognition in conversation, improving performance across multiple models.
Contribution
It presents a novel Divide and Refine approach that explicitly decomposes multimodal signals into unique, redundant, and synergistic components, and refines them for better emotion recognition.
Findings
Consistent performance improvements on IEMOCAP and MELD datasets.
Effective enhancement of multimodal representations across various MERC models.
Demonstrated the importance of explicit decomposition and refinement in multimodal learning.
Abstract
Multimodal emotion recognition in conversation (MERC) requires representations that effectively integrate signals from multiple modalities. These signals include modality-specific cues, information shared across modalities, and interactions that emerge only when modalities are combined. In information-theoretic terms, these correspond to \emph{unique}, \emph{redundant}, and \emph{synergistic} contributions. An ideal representation should leverage all three, yet achieving such balance remains challenging. Recent advances in contrastive learning and augmentation-based methods have made progress, but they often overlook the role of data preparation in preserving these components. In particular, applying augmentations directly to raw inputs or fused embeddings can blur the boundaries between modality-unique and cross-modal signals. To address this challenge, we propose a two-phase framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Social Robot Interaction and HRI
