ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge
Juewen Hu, Yexin Li, Jiulin Li, Shuo Chen, Pring Wong

TL;DR
This paper introduces a novel multimodal emotion recognition framework that leverages large-scale pre-trained models, a dual-branch visual encoder, context-enriched textual analysis, and advanced fusion strategies to improve performance in the MER-SEMI challenge.
Contribution
The paper presents a new multimodal emotion recognition framework with innovative feature extraction, fusion, and label refinement techniques tailored for the MER2025-SEMI dataset.
Findings
Achieved a weighted F-score of 87.49% on MER2025-SEMI dataset.
Outperformed the official baseline with significant performance gains.
Validated the effectiveness of the proposed multimodal fusion strategy.
Abstract
Emotion recognition plays a vital role in enhancing human-computer interaction. In this study, we tackle the MER-SEMI challenge of the MER2025 competition by proposing a novel multimodal emotion recognition framework. To address the issue of data scarcity, we leverage large-scale pre-trained models to extract informative features from visual, audio, and textual modalities. Specifically, for the visual modality, we design a dual-branch visual encoder that captures both global frame-level features and localized facial representations. For the textual modality, we introduce a context-enriched method that employs large language models to enrich emotional cues within the input text. To effectively integrate these multimodal features, we propose a fusion strategy comprising two key components, i.e., self-attention mechanisms for dynamic modality weighting, and residual connections to preserve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
