TL;DR
This paper introduces Attribute-Grounded Selective Reasoning (AGSR) for artwork emotion understanding, leveraging a new annotated dataset and a supervised multi-agent framework to improve interpretability and accuracy of emotional analysis.
Contribution
It proposes a novel AGSR framework with formal-attribute bottleneck-guided reasoning and extends EmoArt with salience annotations for better attribute selection.
Findings
FAB-G improves emotion, arousal, and valence prediction accuracy.
FAB-G produces more compact and human-aligned explanations.
Attribute-grounded salience transferability across datasets is demonstrated.
Abstract
Multimodal large language models (MLLMs) can produce fluent artwork emotion explanations, but they often suffer from attribute flooding: they enumerate many visible formal attributes without identifying which cues actually support the affective judgment. We therefore formulate artwork emotion understanding as Attribute-Grounded Selective Reasoning (AGSR), where predefined formal attributes serve as evidence units and only emotionally operative attributes should enter the final interpretation. To make this problem measurable, we extend EmoArt, originally introduced at ACM MM 2025 as a 132,664-artwork resource with content, formal-attribute, valence-arousal, and emotion annotations, by adding a 1,400-artwork human salience extension annotated by 15 art-trained annotators. This extension provides instance-level supervision for distinguishing attributes that are merely present from those…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
