SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
Anushka Sivakumar, Andrew Zhang, Zaber Hakim, Chris Thomas

TL;DR
SteerVLM introduces a lightweight, inference-time control module for vision-language models that guides outputs without retraining, using activation steering and a new multimodal dataset.
Contribution
A novel lightweight activation steering method enabling fine-grained, inference-time control of VLMs without modifying weights, supported by the VNIA dataset for evaluation.
Findings
Outperforms existing intervention techniques on steering benchmarks
Requires only 0.14% of original model parameters for control
Effectively mitigates hallucinations in VLM outputs
Abstract
This work introduces SteerVLM, a lightweight steering module designed to guide Vision-Language Models (VLMs) towards outputs that better adhere to desired instructions. Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context. This allows for fine-grained, inference-time control over complex output semantics without modifying model weights while preserving performance on off-target tasks. Our steering module requires learning parameters equal to 0.14% of the original VLM's size. Our steering module gains model control through dimension-wise activation modulation and adaptive steering across layers without requiring pre-extracted static vectors or manual tuning of intervention points. Furthermore, we introduce VNIA (Visual Narrative Intent Alignment), a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
