One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs
Youxu Shi, Suorong Yang, Dong Liu

TL;DR
This paper introduces OSGA, a one-shot, input-independent steering vector method that enhances vision language models by mitigating hallucinations and safety issues efficiently and effectively across various tasks.
Contribution
The paper proposes OSGA, a novel one-shot optimization framework that learns a universal steering vector for VLMs, reducing the need for multiple optimizations and improving robustness.
Findings
OSGA improves hallucination mitigation across benchmarks.
A single steering vector enhances safety with negligible overhead.
Universal applicability of the learned vector during inference.
Abstract
Vision Language Models (VLMs) achieve strong performance on multimodal tasks but still suffer from hallucination and safety-related failures that persist even at scale. Steering offers a lightweight technique to improve model performance. However, steering, whether input-dependent or input-independent, achieves a meaningful trade-off between efficiency and effectiveness. In this work, we observe that steering vectors can generalize across inputs when tasks share aligned semantic intent. Based on this insight, we propose \textbf{OSGA} (\textbf{O}ne-shot \textbf{S}teering with \textbf{G}enerative \textbf{A}nchor), an input-independent framework that improves model performance with a single optimization instance. OSGA first selects an informative sample via a variance-based data selection strategy and learns a single steering vector with a contrastive objective with generative anchor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
