Mobilizing Waldo: Evaluating Multimodal AI for Public Mobilization
Manuel Cebrian, Petter Holme, and Niccolo Pescetelli

TL;DR
This paper introduces a novel framework using 'Where's Waldo?' images to ethically evaluate multimodal LLMs' abilities in social influence and mobilization scenarios, highlighting their strengths and limitations.
Contribution
It presents a controlled, replicable testing environment for assessing multimodal LLMs' social understanding and strategic capabilities in public mobilization contexts.
Findings
Models generate creative strategies and vivid descriptions.
Models struggle to accurately identify individuals.
Models cannot reliably assess social dynamics.
Abstract
Advancements in multimodal Large Language Models (LLMs), such as OpenAI's GPT-4o, offer significant potential for mediating human interactions across various contexts. However, their use in areas such as persuasion, influence, and recruitment raises ethical and security concerns. To evaluate these models ethically in public influence and persuasion scenarios, we developed a prompting strategy using "Where's Waldo?" images as proxies for complex, crowded gatherings. This approach provides a controlled, replicable environment to assess the model's ability to process intricate visual information, interpret social dynamics, and propose engagement strategies while avoiding privacy concerns. By positioning Waldo as a hypothetical agent tasked with face-to-face mobilization, we analyzed the model's performance in identifying key individuals and formulating mobilization tactics. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Cities and Technologies
