VAEER: Visual Attention-Inspired Emotion Elicitation Reasoning
Fanhang Man, Xiaoyue Chen, Huandong Wang, Baining Zhao, Han Li, Xinlei Chen

TL;DR
VAEER is an interpretable framework that predicts multiple emotions evoked by images using attention-inspired cues and knowledge-grounded reasoning, achieving state-of-the-art results across diverse datasets.
Contribution
The paper introduces VAEER, a novel multi-label visual emotion elicitation model that combines attention mechanisms with structured affective knowledge for transparent reasoning.
Findings
Achieves up to 19% improvement in emotion prediction accuracy.
Outperforms strong CNN and VLM baselines with a 12.3% average gain.
Effective across social and disaster-related imagery datasets.
Abstract
Images shared online strongly influence emotions and public well-being. Understanding the emotions an image elicits is therefore vital for fostering healthier and more sustainable digital communities, especially during public crises. We study Visual Emotion Elicitation (VEE), predicting the set of emotions that an image evokes in viewers. We introduce VAEER, an interpretable multi-label VEE framework that combines attention-inspired cue extraction with knowledge-grounded reasoning. VAEER isolates salient visual foci and contextual signals, aligns them with structured affective knowledge, and performs per-emotion inference to yield transparent, emotion-specific rationales. Across three heterogeneous benchmarks, including social imagery and disaster-related photos, VAEER achieves state-of-the-art results with up to 19% per-emotion improvements and a 12.3% average gain over strong CNN and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
MethodsALIGN
