Contextual Emotion Estimation from Image Captions
Vera Yang, Archita Srivastava, Yasaman Etesam, Chuxuan Zhang, Angelica, Lim

TL;DR
This paper investigates whether large language models can estimate human emotions from image captions, demonstrating that GPT-3.5 can reasonably predict emotions based on scene descriptions, offering a new approach to emotion estimation.
Contribution
The study introduces a novel method of using image captions and LLMs for emotion estimation, including a set of natural language descriptors and a dataset of annotated images.
Findings
GPT-3.5 provides reasonable emotion predictions
Caption-based approach offers interpretability for emotion perception
Accuracy varies depending on the emotion concept
Abstract
Emotion estimation in images is a challenging task, typically using computer vision methods to directly estimate people's emotions using face, body pose and contextual cues. In this paper, we explore whether Large Language Models (LLMs) can support the contextual emotion estimation task, by first captioning images, then using an LLM for inference. First, we must understand: how well do LLMs perceive human emotions? And which parts of the information enable them to determine emotions? One initial challenge is to construct a caption that describes a person within a scene with information relevant for emotion perception. Towards this goal, we propose a set of natural language descriptors for faces, bodies, interactions, and environments. We use them to manually generate captions and emotion annotations for a subset of 331 images from the EMOTIC dataset. These captions offer an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining · Image Retrieval and Classification Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Residual Connection · Attention Dropout · Adam · Layer Normalization · Dropout
