Do Vision-Language Models Understand Visual Persuasiveness?
Gyuwon Park

TL;DR
This paper investigates whether vision-language models truly understand visual persuasion by analyzing their ability to predict human judgments, revealing limitations in linking objects to communicative intent and proposing strategies to improve reasoning.
Contribution
The paper introduces a high-consensus dataset and a taxonomy of Visual Persuasive Factors, providing new insights into VLMs' understanding of visual persuasion and testing intervention strategies.
Findings
High-level semantic cues are the strongest predictor of persuasiveness.
VLMs tend to over-predict high persuasiveness and struggle with low/mid-level features.
Concise, object-grounded rationales improve model performance.
Abstract
Recent advances in vision-language models (VLMs) have enabled impressive multi-modal reasoning and understanding. Yet, whether these models truly grasp visual persuasion-how visual cues shape human attitudes and decisions-remains unclear. To probe this question, we construct a high-consensus dataset for binary persuasiveness judgment and introduce the taxonomy of Visual Persuasive Factors (VPFs), encompassing low-level perceptual, mid-level compositional, and high-level semantic cues. We also explore cognitive steering and knowledge injection strategies for persuasion-relevant reasoning. Empirical analysis across VLMs reveals a recall-oriented bias-models over-predict high persuasiveness-and weak discriminative power for low/mid-level features. In contrast, high-level semantic alignment between message and object presence emerges as the strongest predictor of human judgment. Among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Visual Attention and Saliency Detection
