Large Vision-Language Models as Emotion Recognizers in Context Awareness

Yuxuan Lei; Dingkang Yang; Zhaoyu Chen; Jiawei Chen; Peng Zhai; Lihua; Zhang

arXiv:2407.11300·cs.CV·July 17, 2024

Large Vision-Language Models as Emotion Recognizers in Context Awareness

Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua, Zhang

PDF

Open Access

TL;DR

This paper investigates the use of Large Vision-Language Models for context-aware emotion recognition, exploring fine-tuning, zero-shot, few-shot, and reasoning-enhanced methods, demonstrating competitive performance with minimal training.

Contribution

It systematically evaluates LVLMs for CAER across multiple paradigms, including training-free in-context learning and reasoning-based approaches, highlighting their potential without extensive data.

Findings

01

LVLMs achieve competitive CAER performance across paradigms.

02

Few-shot settings show strong results, indicating minimal training sufficiency.

03

Incorporating Chain-of-Thought improves reasoning and interpretability.

Abstract

Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore, acquiring large amounts of labeled data is often challenging in real-world applications. In this paper, we systematically explore the potential of leveraging Large Vision-Language Models (LVLMs) to empower the CAER task from three paradigms: 1) We fine-tune LVLMs on two CAER datasets, which is the most common way to transfer large models to downstream tasks. 2) We design zero-shot and few-shot patterns to evaluate the performance of LVLMs in scenarios with limited data or even completely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Human Pose and Action Recognition

MethodsFocus · Balanced Selection