Emotion Knowledge Enhancement for Vision Large Language Models: A Self-Verification Approach for High-Quality Emotion Instruction Data Generation
Feifan Wang, Tengfei Song, Minggui He, Chang Su, Zhanglin Wu, Hao Yang, Wenming Zheng, Osamu Yoshie

TL;DR
This paper presents SEKE, a novel self-verification method that leverages emotion knowledge to generate high-quality facial emotion instruction data for vision large language models, enhancing their emotion perception capabilities.
Contribution
It introduces a cost-effective approach combining emotion knowledge and self-verification to produce comprehensive emotion annotations, improving facial emotion analysis performance.
Findings
Outperforms state-of-the-art methods on three emotion analysis tasks.
Constructs a new facial emotion instruction dataset (FEID).
Provides a benchmark (FEAB) for evaluating VLLM emotion perception.
Abstract
Facial emotion perception in the vision large language model (VLLM) is crucial for achieving natural human-machine interaction. However, creating high-quality annotations for both coarse- and fine-grained facial emotion analysis demands costly expertise. The lack of such high-quality instruction data limits the performance of VLLMs in facial emotion perception. To address this, we propose a self-verification approach with emotion knowledge enhancement (SEKE), which generates high-quality instruction data for multi-grained emotion analysis cost-effectively using closed-source VLLM. This approach integrates prior human knowledge to VLLM inference, guided by the inherent correlations between three grained levels of emotion descriptions, i.e., discrete expression, valence-arousal, and action unit, to reliably generate comprehensive annotations. A self-verification strategy with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Human Pose and Action Recognition
