Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation
Shihao Zou, Xianying Huang, Xudong Shen

TL;DR
This paper introduces a Multimodal Prompt Transformer with Hybrid Contrastive Learning to improve emotion recognition in conversations by effectively fusing multimodal data and handling few-sample labels, outperforming existing models.
Contribution
The paper proposes a novel Multimodal Prompt Transformer and Hybrid Contrastive Learning strategy to enhance multimodal fusion and few-sample label recognition in ERC.
Findings
Outperforms state-of-the-art ERC models on benchmark datasets.
Effectively fuses multimodal information through prompt-based attention.
Improves recognition of emotions with limited labeled samples.
Abstract
Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection
