Learning Emotion Representations from Verbal and Nonverbal Communication
Sitao Zhang, Yimu Pan, James Z. Wang

TL;DR
EmotionCLIP is a novel pre-training framework that learns visual emotion representations from uncurated verbal and nonverbal communication data, addressing data scarcity and improving emotion recognition performance.
Contribution
This work introduces EmotionCLIP, the first pre-training paradigm for extracting emotion representations from communication data without curated labels, using subject-aware and sentiment-guided learning.
Findings
Outperforms state-of-the-art supervised methods in emotion recognition.
Rivals multimodal approaches across various benchmarks.
Addresses data scarcity in emotion understanding.
Abstract
Emotion understanding is an essential but highly challenging component of artificial general intelligence. The absence of extensively annotated datasets has significantly impeded advancements in this field. We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication using only uncurated data. Compared to numerical labels or descriptions used in previous methods, communication naturally contains emotion information. Furthermore, acquiring emotion representations from communication is more congruent with the human learning process. We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning. Extensive experiments validate the effectiveness and transferability of EmotionCLIP. Using merely linear-probe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Gaze Tracking and Assistive Technology
