UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
Chuang Chen, Xiao Sun, Zhi Liu

TL;DR
UniEmoX is a novel cross-modal pretraining framework that combines psychological insights with contrastive learning and masked image modeling to improve universal scene emotion perception across diverse scenarios.
Contribution
It introduces the first large-scale pretraining method integrating psychological theories with contrastive learning for emotion analysis in varied visual contexts.
Findings
Effective across six benchmark datasets
Outperforms existing emotion analysis methods
Demonstrates strong generalization to diverse scenarios
Abstract
Visual emotion analysis holds significant research value in both computer vision and psychology. However, existing methods for visual emotion analysis suffer from limited generalizability due to the ambiguity of emotion perception and the diversity of data scenarios. To tackle this issue, we introduce UniEmoX, a cross-modal semantic-guided large-scale pretraining framework. Inspired by psychological research emphasizing the inseparability of the emotional exploration process from the interaction between individuals and their environment, UniEmoX integrates scene-centric and person-centric low-level image spatial structural information, aiming to derive more nuanced and discriminative emotional representations. By exploiting the similarity between paired and unpaired image-text samples, UniEmoX distills rich semantic knowledge from the CLIP model to enhance emotional embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Emotion and Mood Recognition · Multimodal Machine Learning Applications
MethodsContrastive Language-Image Pre-training · Contrastive Learning
