TL;DR
This paper introduces a multimodal emotion recognition system that combines Emotion-LLaMA with Conv-Attention, achieving state-of-the-art results in the MER2024 Challenge by improving annotation quality and multimodal fusion.
Contribution
The paper presents Conv-Attention, a novel hybrid framework for multimodal fusion, and leverages Emotion-LLaMA for high-quality annotation, advancing emotion recognition performance.
Findings
Achieved 85.30% weighted F-score in MER-NOISE, surpassing previous methods.
Improved average accuracy and recall by 8.52% over GPT-4V in MER-OV.
Secured the top score among large multimodal models in MER-OV.
Abstract
This paper presents our winning approach for the MER-NOISE and MER-OV tracks of the MER2024 Challenge on multimodal emotion recognition. Our system leverages the advanced emotional understanding capabilities of Emotion-LLaMA to generate high-quality annotations for unlabeled samples, addressing the challenge of limited labeled data. To enhance multimodal fusion while mitigating modality-specific noise, we introduce Conv-Attention, a lightweight and efficient hybrid framework. Extensive experimentation vali-dates the effectiveness of our approach. In the MER-NOISE track, our system achieves a state-of-the-art weighted average F-score of 85.30%, surpassing the second and third-place teams by 1.47% and 1.65%, respectively. For the MER-OV track, our utilization of Emotion-LLaMA for open-vocabulary annotation yields an 8.52% improvement in average accuracy and recall compared to GPT-4V,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
