Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Zhixian Zhao, Haifeng Chen, Xi Li, Dongmei Jiang, Lei Xie

TL;DR
This paper enhances multimodal emotion recognition by optimizing acoustic feature adaptation through PEFT, aligning visual features with acoustic space using large-scale unlabeled data, and fusing features with attention, achieving high accuracy with limited data.
Contribution
It introduces a novel PEFT-based acoustic adaptation and a large-scale unlabeled data-driven visual alignment method for improved MER performance.
Findings
Achieved a weighted F1 score of 88.90% on MER2024-SEMI test set.
Demonstrated effective acoustic adaptation with minimal parameters.
Validated the benefit of semantic visual alignment in multimodal emotion recognition.
Abstract
Multimodal Emotion Recognition (MER) aims to automatically identify and understand human emotional states by integrating information from various modalities. However, the scarcity of annotated multimodal data significantly hinders the advancement of this research field. This paper presents our solution for the MER-SEMI sub-challenge of MER 2024. First, to better adapt acoustic modality features for the MER task, we experimentally evaluate the contributions of different layers of the pre-trained speech model HuBERT in emotion recognition. Based on these observations, we perform Parameter-Efficient Fine-Tuning (PEFT) on the layers identified as most effective for emotion recognition tasks, thereby achieving optimal adaptation for emotion recognition with a minimal number of learnable parameters. Second, leveraging the strengths of the acoustic modality, we propose a feature alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
