VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering

Youting Wang; Yuan Tang; Yitian Qian; Chen Zhao

arXiv:2603.13385·cs.CV·March 17, 2026

VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering

Youting Wang, Yuan Tang, Yitian Qian, Chen Zhao

PDF

Open Access

TL;DR

This paper introduces VisualLeakBench, a comprehensive evaluation suite for assessing the robustness of large vision-language models against privacy leaks and social engineering, revealing vulnerabilities and mitigation strategies.

Contribution

It provides a new benchmark with synthetic and real-world data to systematically evaluate PII leakage and visual attacks on state-of-the-art LVLMs, highlighting model vulnerabilities and mitigation effects.

Findings

01

Claude 4 has high PII leakage (74.4%) but low OCR errors (14.2%).

02

Grok-4 shows lower PII leakage (20.4%) across models.

03

Mitigation prompts significantly reduce PII leakage in some models.

Abstract

As Large Vision-Language Models (LVLMs) are increasingly deployed in agent-integrated workflows and other deployment-relevant settings, their robustness against semantic visual attacks remains under-evaluated -- alignment is typically tested on explicit harmful content rather than privacy-critical multimodal scenarios. We introduce VisualLeakBench, an evaluation suite to audit LVLMs against OCR Injection and Contextual PII Leakage using 1,000 synthetically generated adversarial images with 8 PII types, validated on 50 in-the-wild (IRL) real-world screenshots spanning diverse visual contexts. We evaluate four frontier systems (GPT-5.2, Claude~4, Gemini-3 Flash, Grok-4) with Wilson 95% confidence intervals. Claude~4 achieves the lowest OCR ASR (14.2%) but the highest PII ASR (74.4%), exhibiting a comply-then-warn pattern -- where verbatim data disclosure precedes any safety-oriented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Multimodal Machine Learning Applications