Contextual Emotion Recognition using Large Vision Language Models

Yasaman Etesam; \"Ozge Nilay Yal\c{c}{\i}n; Chuxuan Zhang and; Angelica Lim

arXiv:2405.08992·cs.CV·February 3, 2025

Contextual Emotion Recognition using Large Vision Language Models

Yasaman Etesam, \"Ozge Nilay Yal\c{c}{\i}n, Chuxuan Zhang and, Angelica Lim

PDF

Open Access

TL;DR

This paper explores using large vision language models for contextual emotion recognition, demonstrating that fine-tuned models outperform traditional methods on the EMOTIC dataset, aiding emotionally aware AI systems.

Contribution

It introduces and evaluates two approaches with large vision language models for emotion recognition, highlighting the effectiveness of fine-tuning even on small datasets.

Findings

01

Fine-tuned vision language models outperform traditional baselines.

02

Zero-shot and fine-tuned setups show significant improvements.

03

Results support use in emotionally sensitive AI applications.

Abstract

"How does the person in the bounding box feel?" Achieving human-level recognition of the apparent emotion of a person in real world situations remains an unsolved task in computer vision. Facial expressions are not enough: body pose, contextual knowledge, and commonsense reasoning all contribute to how humans perform this emotional theory of mind task. In this paper, we examine two major approaches enabled by recent large vision language models: 1) image captioning followed by a language-only LLM, and 2) vision language models, under zero-shot and fine-tuned setups. We evaluate the methods on the Emotions in Context (EMOTIC) dataset and demonstrate that a vision language model, fine-tuned even on a small dataset, can significantly outperform traditional baselines. The results of this work aim to help robots and agents perform emotionally sensitive decision-making and interaction in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining