VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

Alexandros Xenos; Niki Maria Foteinopoulou; Ioanna Ntinou; Ioannis Patras; Georgios Tzimiropoulos

arXiv:2404.07078·cs.CV·July 16, 2025·1 cites

VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple two-stage method leveraging Vision-and-Large-Language Models to generate natural language descriptions of emotions in context, which improves emotion recognition accuracy across multiple datasets.

Contribution

The work demonstrates that using VLLMs to produce emotion-related descriptions enhances in-context emotion classification, simplifying training and achieving state-of-the-art results.

Findings

01

Outperforms previous methods on BoLD, EMOTIC, and CAER-S datasets.

02

Textual descriptions help constrain noisy visual inputs.

03

Achieves state-of-the-art performance without complex training pipelines.

Abstract

Recognising emotions in context involves identifying an individual's apparent emotions while considering contextual cues from the surrounding scene. Previous approaches to this task have typically designed explicit scene-encoding architectures or incorporated external scene-related information, such as captions. However, these methods often utilise limited contextual information or rely on intricate training pipelines to decouple noise from relevant information. In this work, we leverage the capabilities of Vision-and-Large-Language Models (VLLMs) to enhance in-context emotion classification in a more straightforward manner. Our proposed method follows a simple yet effective two-stage approach. First, we prompt VLLMs to generate natural language descriptions of the subject's apparent emotion in relation to the visual context. Second, the descriptions, along with the visual input, are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nickyfot/emocommonsense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Robotics and Automated Systems