EmoGist: Efficient In-Context Learning for Visual Emotion Understanding

Ronald Seoh; Dan Goldwasser

arXiv:2505.14660·cs.CL·September 23, 2025

EmoGist: Efficient In-Context Learning for Visual Emotion Understanding

Ronald Seoh, Dan Goldwasser

PDF

Open Access 1 Video

TL;DR

EmoGist is a training-free in-context learning approach that improves visual emotion classification by using context-dependent label descriptions and image clustering, significantly enhancing accuracy across multiple datasets.

Contribution

We propose EmoGist, a novel method that leverages label descriptions and image clustering for improved emotion recognition without additional training.

Findings

01

Up to 12 points improvement in micro F1 scores on Memotion dataset.

02

Up to 8 points improvement in macro F1 scores on FI dataset.

03

Effective in multi-label and multi-class emotion classification tasks.

Abstract

In this paper, we introduce EmoGist, a training-free, in-context learning method for performing visual emotion classification with LVLMs. The key intuition of our approach is that context-dependent definition of emotion labels could allow more accurate predictions of emotions, as the ways in which emotions manifest within images are highly context dependent and nuanced. EmoGist pre-generates multiple descriptions of emotion labels, by analyzing the clusters of example images belonging to each label. At test time, we retrieve a version of description based on the cosine similarity of test image to cluster centroids, and feed it together with the test image to a fast LVLM for classification. Through our experiments, we show that EmoGist allows up to 12 points improvement in micro F1 scores with the multi-label Memotion dataset, and up to 8 points in macro F1 in the multi-class FI dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

EmoGist: Efficient In-Context Learning for Visual Emotion Understanding· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Emotion and Mood Recognition