Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
Qi Fan, Yutong Li, Yi Xin, Xinyu Cheng, Guanglai Gao, Miao Ma

TL;DR
This paper introduces a semi-supervised multimodal emotion recognition approach combining contrastive learning, self-training, and ensemble voting to improve performance with limited labeled data, validated on the MER2024 challenge.
Contribution
It proposes a novel modality representation contrastive learning framework and a self-training strategy tailored for emotion recognition with scarce annotations.
Findings
Achieved 88.25% weighted F-score on MER2024-SEMI
Effectively addressed class imbalance with oversampling
Ranked 6th on the MER2024-SEMI leaderboard
Abstract
The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsContrastive Learning
