WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition
Feng Li, Jiusong Luo, Wanjun Xia

TL;DR
WavFusion introduces a novel multimodal speech emotion recognition framework that leverages cross-modal attention and discrepancy learning to improve emotion detection accuracy over existing methods.
Contribution
The paper presents WavFusion, a new multimodal SER model that effectively captures cross-modal interactions and learns discriminative features, outperforming prior approaches.
Findings
WavFusion achieves higher accuracy on IEMOCAP and MELD datasets.
The proposed model outperforms existing state-of-the-art methods.
Effective multimodal fusion improves emotion recognition performance.
Abstract
Speech emotion recognition (SER) remains a challenging yet crucial task due to the inherent complexity and diversity of human emotions. To address this problem, researchers attempt to fuse information from other modalities via multimodal learning. However, existing multimodal fusion techniques often overlook the intricacies of cross-modal interactions, resulting in suboptimal feature representations. In this paper, we propose WavFusion, a multimodal speech emotion recognition framework that addresses critical research problems in effective multimodal fusion, heterogeneity among modalities, and discriminative representation learning. By leveraging a gated cross-modal attention mechanism and multimodal homogeneous feature discrepancy learning, WavFusion demonstrates improved performance over existing state-of-the-art methods on benchmark datasets. Our work highlights the importance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and Audio Processing
MethodsSoftmax · Attention Is All You Need
