VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Mansi Phute (Georgia Tech), Ravikumar Balakrishnan (HiddenLayer)

TL;DR
VISOR introduces a novel visual input-based method for controlling vision-language models' behavior, enabling effective, imperceptible output redirection without model access, revealing new security vulnerabilities.
Contribution
It presents VISOR, a universal visual steering technique that achieves bidirectional behavioral control in VLMs using optimized images, compatible with API-based deployments and exposing security risks.
Findings
A single steering image achieves 1-2% performance shift for positive control.
VISOR outperforms system prompting in negative steering by up to 25%.
Maintains 99.9% accuracy on unrelated tasks despite behavioral manipulation.
Abstract
Vision Language Models (VLMs) are increasingly being used in a broad range of applications, bringing their security and behavioral control to the forefront. While existing approaches for behavioral control or output redirection, like system prompting in VLMs, are easily detectable and often ineffective, activation-based steering vectors require invasive runtime access to model internals--incompatible with API-based services and closed-source deployments. We introduce VISOR (Visual Input-based Steering for Output Redirection), a novel method that achieves sophisticated behavioral control through optimized visual inputs alone. By crafting universal steering images that induce target activation patterns, VISOR enables practical deployment across all VLM serving modalities while remaining imperceptible compared to explicit textual instructions. We validate VISOR on LLaVA-1.5-7B across three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Security and Verification in Computing
