MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion

Haoyi Qiu; Yilun Zhou; Pranav Narayanan Venkit; Kung-Hsiang Huang; Jiaxin Zhang; Nanyun Peng; Chien-Sheng Wu

arXiv:2510.22768·cs.CL·October 28, 2025

MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion

Haoyi Qiu, Yilun Zhou, Pranav Narayanan Venkit, Kung-Hsiang Huang, Jiaxin Zhang, Nanyun Peng, Chien-Sheng Wu

PDF

3 Reviews

TL;DR

This paper introduces MMPersuade, a comprehensive dataset and evaluation framework for studying how large vision-language models are influenced by multimodal persuasive content across various contexts.

Contribution

It provides a novel dataset and evaluation tools to systematically analyze the persuasion dynamics and susceptibility of LVLMs to multimodal inputs.

Findings

01

Multimodal inputs significantly increase persuasion effectiveness and susceptibility.

02

Prior preferences reduce susceptibility but multimodal influence remains strong.

03

Effectiveness of persuasion strategies varies across contexts, with reciprocity, credibility, and logic being most effective.

Abstract

As Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news, they are exposed to pervasive persuasive content. A critical question is how these models function as persuadees-how and why they can be influenced by persuasive multimodal inputs. Understanding both their susceptibility to persuasion and the effectiveness of different persuasive strategies is crucial, as overly persuadable models may adopt misleading beliefs, override user preferences, or generate unethical or unsafe outputs when exposed to manipulative messages. We introduce MMPersuade, a unified framework for systematically studying multimodal persuasion dynamics in LVLMs. MMPersuade contributes (i) a comprehensive multimodal dataset that pairs images and videos with established persuasion principles across commercial, subjective and behavioral, and adversarial contexts,…

Peer Reviews

Decision·ICLR 2026 Conference Desk Rejected Submission

Reviewer 01Rating 2Confidence 4

Strengths

I liked the following things about the paper: 1. The Persuasion Discounted Cumulative Gain (PDCG) Metric - The paper adapts the concept of discounted cumulative gain (DCG) from information retrieval to model persuasion dynamics. The metric captures both the magnitude and timing of attitude shifts by rewarding early and strong persuasion outcomes. This design provides a unified quantitative measure that integrates explicit agreement scoring and implicit belief estimation through token probabili

Weaknesses

1. Motivation and Conceptual Framing: The paper lacks a clear motivation for why persuasion, a fundamentally human and social phenomenon, should be studied within the scope of purely generative LVLMs. Persuasion involves complex constructs such as intention, emotional response, and belief updating—dimensions that generative models do not genuinely possess. Therefore, before introducing a dataset or benchmark, the authors should articulate why modeling persuasion in LVLMs is meaningful. Is the go

Reviewer 02Rating 6Confidence 3

Strengths

1- This paper studies an important gap in multimodal persuasion research. Multimodal persuasion and persuasiveness of visual content are emerging topics and are still underexplored. Specifically, this work introduces the first multimodal persuasiveness benchmark in dialogues and studies LVLM agents as pursuadees. The benchmark studies both image and video generative content and its pipeline and metrics are grounded based on persuasive literature 2- This work proposes a new metric, yet simple.

Weaknesses

1- **Some related works are missing**. While I agree that the proposed benchmark is different and novel in the multimodal setting, there exist several multimodal persuasive benchmarks/data. Hence, it's important to acknowledge the existing works and clearly describe the difference of mmpersuasde with existing benchmarks. Some examples are [1-4] are some main examples; however, there are more related works that I encourage authors to acknowledge and discuss in the related work section of the next

Reviewer 03Rating 2Confidence 4

Strengths

The paper introduces a comprehensive set of multimodal persuasion scenarios covering diverse contexts and strategy types. As an early benchmark for multimodal persuasion evaluation, it provides a unified framework to compare LVLM behaviors when exposed to persuasive inputs. This resource can support research on both mitigating undesirable persuasion and guiding LVLMs toward appropriate responses under adversarial, multimodal persuasive prompts.

Weaknesses

# 1. Potential bias in model comparison (RQ1) The benchmark partially relies on GPT-generated persuasive content. (The DailyPersuasion dataset used in the commercial and subjective contexts are generated by GPT-4.) This raises concerns about favorably biasing GPT-based models and disadvantaging others such as Gemini. Consequently, claims about relative resistance or compliance across models may reflect dataset generation artifacts rather than inherent model differences. # 2. Lack of Rationale B

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.