Physical Prompt Injection Attacks on Large Vision-Language Models
Chen Ling, Kai Hu, Hangcheng Liu, Xingshuo Han, Tianwei Zhang, Changhai Ou

TL;DR
This paper introduces a novel black-box physical prompt injection attack on large vision-language models, embedding malicious visual prompts into physical objects to manipulate model outputs without model access.
Contribution
It presents the first physical, query-agnostic attack method that operates solely through visual observation and strategic placement, demonstrating high success rates across multiple models and conditions.
Findings
Achieves up to 98% attack success rate
Effective under varying physical conditions
Works across multiple state-of-the-art LVLMs
Abstract
Large Vision-Language Models (LVLMs) are increasingly deployed in real-world intelligent systems for perception and reasoning in open physical environments. While LVLMs are known to be vulnerable to prompt injection attacks, existing methods either require access to input channels or depend on knowledge of user queries, assumptions that rarely hold in practical deployments. We propose the first Physical Prompt Injection Attack (PPIA), a black-box, query-agnostic attack that embeds malicious typographic instructions into physical objects perceivable by the LVLM. PPIA requires no access to the model, its inputs, or internal pipeline, and operates solely through visual observation. It combines offline selection of highly recognizable and semantically effective visual prompts with strategic environment-aware placement guided by spatiotemporal attention, ensuring that the injected prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
