Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
Jingjun Liang, Ruichen Li, Qin Jin

TL;DR
This paper introduces a semi-supervised multi-modal emotion recognition model that uses cross-modal distribution matching to effectively leverage unlabeled data, improving performance over existing methods on benchmark datasets.
Contribution
The paper proposes a novel semi-supervised approach based on cross-modality distribution matching for multi-modal emotion recognition, addressing data scarcity and label ambiguity issues.
Findings
Outperforms state-of-the-art methods on IEMOCAP and MELD datasets.
Effectively utilizes unlabeled data to enhance emotion recognition.
Achieves competitive results without relying on auxiliary information.
Abstract
Automatic emotion recognition is an active research topic with wide range of applications. Due to the high manual annotation cost and inevitable label ambiguity, the development of emotion recognition dataset is limited in both scale and quality. Therefore, one of the key challenges is how to build effective models with limited data resource. Previous works have explored different approaches to tackle this challenge including data enhancement, transfer learning, and semi-supervised learning etc. However, the weakness of these existing approaches includes such as training instability, large performance loss during transfer, or marginal improvement. In this work, we propose a novel semi-supervised multi-modal emotion recognition model based on cross-modality distribution matching, which leverages abundant unlabeled data to enhance the model training under the assumption that the inner…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
