Emotion Knowledge Enhancement for Vision Large Language Models: A Self-Verification Approach for High-Quality Emotion Instruction Data Generation

Feifan Wang; Tengfei Song; Minggui He; Chang Su; Zhanglin Wu; Hao Yang; Wenming Zheng; Osamu Yoshie

arXiv:2505.18168·cs.LG·May 27, 2025

Emotion Knowledge Enhancement for Vision Large Language Models: A Self-Verification Approach for High-Quality Emotion Instruction Data Generation

Feifan Wang, Tengfei Song, Minggui He, Chang Su, Zhanglin Wu, Hao Yang, Wenming Zheng, Osamu Yoshie

PDF

Open Access

TL;DR

This paper presents SEKE, a novel self-verification method that leverages emotion knowledge to generate high-quality facial emotion instruction data for vision large language models, enhancing their emotion perception capabilities.

Contribution

It introduces a cost-effective approach combining emotion knowledge and self-verification to produce comprehensive emotion annotations, improving facial emotion analysis performance.

Findings

01

Outperforms state-of-the-art methods on three emotion analysis tasks.

02

Constructs a new facial emotion instruction dataset (FEID).

03

Provides a benchmark (FEAB) for evaluating VLLM emotion perception.

Abstract

Facial emotion perception in the vision large language model (VLLM) is crucial for achieving natural human-machine interaction. However, creating high-quality annotations for both coarse- and fine-grained facial emotion analysis demands costly expertise. The lack of such high-quality instruction data limits the performance of VLLMs in facial emotion perception. To address this, we propose a self-verification approach with emotion knowledge enhancement (SEKE), which generates high-quality instruction data for multi-grained emotion analysis cost-effectively using closed-source VLLM. This approach integrates prior human knowledge to VLLM inference, guided by the inherent correlations between three grained levels of emotion descriptions, i.e., discrete expression, valence-arousal, and action unit, to reliably generate comprehensive annotations. A self-verification strategy with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Human Pose and Action Recognition