Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion
Zanxu Wang, Homayoon Beigi

TL;DR
This paper introduces a quality-controlled, transfer learning-based multimodal emotion recognition framework that leverages identity-aware embeddings and MAMBA fusion to improve accuracy on conversation datasets.
Contribution
It proposes a novel quality control pipeline and combines identity-based transfer learning with MAMBA fusion for enhanced emotion recognition in conversations.
Findings
Achieved 64.8% accuracy on MELD dataset.
Achieved 74.3% accuracy on IEMOCAP dataset.
Demonstrated the effectiveness of identity-based embeddings in emotion recognition.
Abstract
This paper addresses data quality issues in multimodal emotion recognition in conversation (MERC) through systematic quality control and multi-stage transfer learning. We implement a quality control pipeline for MELD and IEMOCAP datasets that validates speaker identity, audio-text alignment, and face detection. We leverage transfer learning from speaker and face recognition, assuming that identity-discriminative embeddings capture not only stable acoustic and Facial traits but also person-specific patterns of emotional expression. We employ RecoMadeEasy(R) engines for extracting 512-dimensional speaker and face embeddings, fine-tune MPNet-v2 for emotion-aware text representations, and adapt these features through emotion-specific MLPs trained on unimodal datasets. MAMBA-based trimodal fusion achieves 64.8% accuracy on MELD and 74.3% on IEMOCAP. These results show that combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Face recognition and analysis · Sentiment Analysis and Opinion Mining
