VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering
Youting Wang, Yuan Tang, Yitian Qian, Chen Zhao

TL;DR
This paper introduces VisualLeakBench, a comprehensive evaluation suite for assessing the robustness of large vision-language models against privacy leaks and social engineering, revealing vulnerabilities and mitigation strategies.
Contribution
It provides a new benchmark with synthetic and real-world data to systematically evaluate PII leakage and visual attacks on state-of-the-art LVLMs, highlighting model vulnerabilities and mitigation effects.
Findings
Claude 4 has high PII leakage (74.4%) but low OCR errors (14.2%).
Grok-4 shows lower PII leakage (20.4%) across models.
Mitigation prompts significantly reduce PII leakage in some models.
Abstract
As Large Vision-Language Models (LVLMs) are increasingly deployed in agent-integrated workflows and other deployment-relevant settings, their robustness against semantic visual attacks remains under-evaluated -- alignment is typically tested on explicit harmful content rather than privacy-critical multimodal scenarios. We introduce VisualLeakBench, an evaluation suite to audit LVLMs against OCR Injection and Contextual PII Leakage using 1,000 synthetically generated adversarial images with 8 PII types, validated on 50 in-the-wild (IRL) real-world screenshots spanning diverse visual contexts. We evaluate four frontier systems (GPT-5.2, Claude~4, Gemini-3 Flash, Grok-4) with Wilson 95% confidence intervals. Claude~4 achieves the lowest OCR ASR (14.2%) but the highest PII ASR (74.4%), exhibiting a comply-then-warn pattern -- where verbatim data disclosure precedes any safety-oriented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Multimodal Machine Learning Applications
