TL;DR
This paper introduces MMPersuade, a comprehensive dataset and evaluation framework for studying how large vision-language models are influenced by multimodal persuasive content across various contexts.
Contribution
It provides a novel dataset and evaluation tools to systematically analyze the persuasion dynamics and susceptibility of LVLMs to multimodal inputs.
Findings
Multimodal inputs significantly increase persuasion effectiveness and susceptibility.
Prior preferences reduce susceptibility but multimodal influence remains strong.
Effectiveness of persuasion strategies varies across contexts, with reciprocity, credibility, and logic being most effective.
Abstract
As Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news, they are exposed to pervasive persuasive content. A critical question is how these models function as persuadees-how and why they can be influenced by persuasive multimodal inputs. Understanding both their susceptibility to persuasion and the effectiveness of different persuasive strategies is crucial, as overly persuadable models may adopt misleading beliefs, override user preferences, or generate unethical or unsafe outputs when exposed to manipulative messages. We introduce MMPersuade, a unified framework for systematically studying multimodal persuasion dynamics in LVLMs. MMPersuade contributes (i) a comprehensive multimodal dataset that pairs images and videos with established persuasion principles across commercial, subjective and behavioral, and adversarial contexts,…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
I liked the following things about the paper: 1. The Persuasion Discounted Cumulative Gain (PDCG) Metric - The paper adapts the concept of discounted cumulative gain (DCG) from information retrieval to model persuasion dynamics. The metric captures both the magnitude and timing of attitude shifts by rewarding early and strong persuasion outcomes. This design provides a unified quantitative measure that integrates explicit agreement scoring and implicit belief estimation through token probabili
1. Motivation and Conceptual Framing: The paper lacks a clear motivation for why persuasion, a fundamentally human and social phenomenon, should be studied within the scope of purely generative LVLMs. Persuasion involves complex constructs such as intention, emotional response, and belief updating—dimensions that generative models do not genuinely possess. Therefore, before introducing a dataset or benchmark, the authors should articulate why modeling persuasion in LVLMs is meaningful. Is the go
1- This paper studies an important gap in multimodal persuasion research. Multimodal persuasion and persuasiveness of visual content are emerging topics and are still underexplored. Specifically, this work introduces the first multimodal persuasiveness benchmark in dialogues and studies LVLM agents as pursuadees. The benchmark studies both image and video generative content and its pipeline and metrics are grounded based on persuasive literature 2- This work proposes a new metric, yet simple.
1- **Some related works are missing**. While I agree that the proposed benchmark is different and novel in the multimodal setting, there exist several multimodal persuasive benchmarks/data. Hence, it's important to acknowledge the existing works and clearly describe the difference of mmpersuasde with existing benchmarks. Some examples are [1-4] are some main examples; however, there are more related works that I encourage authors to acknowledge and discuss in the related work section of the next
The paper introduces a comprehensive set of multimodal persuasion scenarios covering diverse contexts and strategy types. As an early benchmark for multimodal persuasion evaluation, it provides a unified framework to compare LVLM behaviors when exposed to persuasive inputs. This resource can support research on both mitigating undesirable persuasion and guiding LVLMs toward appropriate responses under adversarial, multimodal persuasive prompts.
# 1. Potential bias in model comparison (RQ1) The benchmark partially relies on GPT-generated persuasive content. (The DailyPersuasion dataset used in the commercial and subjective contexts are generated by GPT-4.) This raises concerns about favorably biasing GPT-based models and disadvantaging others such as Gemini. Consequently, claims about relative resistance or compliance across models may reflect dataset generation artifacts rather than inherent model differences. # 2. Lack of Rationale B
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
