Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion

Zanxu Wang; Homayoon Beigi

arXiv:2511.14969·eess.AS·November 20, 2025

Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion

Zanxu Wang, Homayoon Beigi

PDF

Open Access

TL;DR

This paper introduces a quality-controlled, transfer learning-based multimodal emotion recognition framework that leverages identity-aware embeddings and MAMBA fusion to improve accuracy on conversation datasets.

Contribution

It proposes a novel quality control pipeline and combines identity-based transfer learning with MAMBA fusion for enhanced emotion recognition in conversations.

Findings

01

Achieved 64.8% accuracy on MELD dataset.

02

Achieved 74.3% accuracy on IEMOCAP dataset.

03

Demonstrated the effectiveness of identity-based embeddings in emotion recognition.

Abstract

This paper addresses data quality issues in multimodal emotion recognition in conversation (MERC) through systematic quality control and multi-stage transfer learning. We implement a quality control pipeline for MELD and IEMOCAP datasets that validates speaker identity, audio-text alignment, and face detection. We leverage transfer learning from speaker and face recognition, assuming that identity-discriminative embeddings capture not only stable acoustic and Facial traits but also person-specific patterns of emotional expression. We employ RecoMadeEasy(R) engines for extracting 512-dimensional speaker and face embeddings, fine-tune MPNet-v2 for emotion-aware text representations, and adapt these features through emotion-specific MLPs trained on unimodal datasets. MAMBA-based trimodal fusion achieves 64.8% accuracy on MELD and 74.3% on IEMOCAP. These results show that combining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Face recognition and analysis · Sentiment Analysis and Opinion Mining