VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models
Zhicheng Zhang, Weicheng Wang, Yongjie Zhu, Wenyu Qin, Pengfei Wan, Di Zhang, Jufeng Yang

TL;DR
This paper introduces VidEmo, a novel emotion reasoning framework for videos that combines attribute perception, expression analysis, and emotional understanding, utilizing a new dataset and two-stage training to improve emotion analysis performance.
Contribution
The paper presents VidEmo, a new video emotion foundation model with a two-stage training process and a large emotion-centric dataset, advancing emotion reasoning in videos.
Findings
Achieves competitive performance across 15 face perception tasks.
Introduces Emo-CFG, a large dataset with 2.1 million samples.
Demonstrates effective emotion reasoning and instruction-following capabilities.
Abstract
Understanding and predicting emotion from videos has gathered significant attention in recent studies, driven by advancements in video large language models (VideoLLMs). While advanced methods have made progress in video emotion analysis, the intrinsic nature of emotions poses significant challenges. Emotions are characterized by dynamic and cues-dependent properties, making it difficult to understand complex and evolving emotional states with reasonable rationale. To tackle these challenges, we propose a novel affective cues-guided reasoning framework that unifies fundamental attribute perception, expression analysis, and high-level emotional understanding in a stage-wise manner. At the core of our approach is a family of video emotion foundation models (VidEmo), specifically designed for emotion reasoning and instruction-following. These models undergo a two-stage tuning process:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Multimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining
