TL;DR
ArtEmis introduces a large-scale dataset linking visual artworks with emotional responses and explanations, enabling models to generate emotionally expressive and semantically rich captions that reflect viewers' affective experiences.
Contribution
The paper presents a novel dataset and models that connect visual art, emotional impact, and language explanations, advancing affective understanding in computer vision.
Findings
Models can generate captions that reflect emotional and abstract content.
The dataset contains 439K emotion annotations and explanations.
Generated captions often capture semantic and emotional aspects beyond visual features.
Abstract
We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate below, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
