Orthogonal Disentanglement with Projected Feature Alignment for Multimodal Emotion Recognition in Conversation
Xinyi Che, Wenbo Wang, Jian Guan, and Qijun Zhao

TL;DR
This paper introduces OD-PFA, a novel framework for multimodal emotion recognition that effectively captures shared and modality-specific emotional cues by disentangling features and aligning shared semantics across modalities.
Contribution
The paper proposes a new orthogonal disentanglement and feature alignment framework that explicitly models both shared and modality-specific emotional information.
Findings
Outperforms state-of-the-art methods on IEMOCAP and MELD datasets.
Effectively captures modality-specific nuances like micro-expressions and tone.
Enhances semantic coherence across modalities through projected feature alignment.
Abstract
Multimodal Emotion Recognition in Conversation (MERC) significantly enhances emotion recognition performance by integrating complementary emotional cues from text, audio, and visual modalities. While existing methods commonly utilize techniques such as contrastive learning and cross-attention mechanisms to align cross-modal emotional semantics, they typically overlook modality-specific emotional nuances like micro-expressions, tone variations, and sarcastic language. To overcome these limitations, we propose Orthogonal Disentanglement with Projected Feature Alignment (OD-PFA), a novel framework designed explicitly to capture both shared semantics and modality-specific emotional cues. Our approach first decouples unimodal features into shared and modality-specific components. An orthogonal disentanglement strategy (OD) enforces effective separation between these components, aided by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications
