Emotion and Intention Guided Multi-Modal Learning for Sticker Response Selection
Yuxuan Hu, Jian Chen, Yuhao Wang, Zixuan Li, Jing Xiong, Pengyue Jia, Wei Wang, Chengming Li, Xiangyu Zhao

TL;DR
This paper introduces a novel multi-modal learning framework that jointly models emotion and intention to improve sticker response selection in online dialogue, addressing limitations of previous isolated modeling approaches.
Contribution
It proposes the first joint modeling framework for emotion and intention in multi-modal learning, incorporating dual-level contrastive alignment and a progressive fusion module.
Findings
Outperforms state-of-the-art methods on two public datasets
Achieves higher accuracy in sticker response selection
Demonstrates effective integration of emotional and intentional cues
Abstract
Stickers are widely used in online communication to convey emotions and implicit intentions. The Sticker Response Selection (SRS) task aims to select the most contextually appropriate sticker based on the dialogue. However, existing methods typically rely on semantic matching and model emotional and intentional cues separately, which can lead to mismatches when emotions and intentions are misaligned. To address this issue, we propose Emotion and Intention Guided Multi-Modal Learning (EIGML). This framework is the first to jointly model emotion and intention, effectively reducing the bias caused by isolated modeling and significantly improving selection accuracy. Specifically, we introduce Dual-Level Contrastive Framework to perform both intra-modality and inter-modality alignment, ensuring consistent representation of emotional and intentional features within and across modalities. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Topic Modeling
