Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding
Chetan Pathade

TL;DR
This paper uncovers security vulnerabilities in vision-language models by demonstrating how malicious prompts can be invisibly embedded in images through steganography, leading to covert manipulation of model behavior.
Contribution
It introduces the first comprehensive study of steganographic prompt injection attacks on VLMs, developing a multi-domain embedding framework and evaluating its effectiveness across multiple models and datasets.
Findings
Achieved an attack success rate of 24.3% on leading VLMs.
Neural steganography methods reached up to 31.8% success.
Maintained visual imperceptibility with PSNR > 38 dB and SSIM > 0.94.
Abstract
Vision-language models (VLMs) have revolutionized multimodal AI applications but introduce novel security vulnerabilities that remain largely unexplored. We present the first comprehensive study of steganographic prompt injection attacks against VLMs, where malicious instructions are invisibly embedded within images using advanced steganographic techniques. Our approach demonstrates that current VLM architectures can inadvertently extract and execute hidden prompts during normal image processing, leading to covert behavioral manipulation. We develop a multi-domain embedding framework combining spatial, frequency, and neural steganographic methods, achieving an overall attack success rate of 24.3% (plus or minus 3.2%, 95% CI) across leading VLMs including GPT-4V, Claude, and LLaVA, with neural steganography methods reaching up to 31.8%, while maintaining reasonable visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
