Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Zhenglin Hua; Jinghan He; Zijun Yao; Tianxu Han; Haiyun Guo; Yuheng Jia; Junfeng Fang

arXiv:2505.16146·cs.CV·September 16, 2025

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Zhenglin Hua, Jinghan He, Zijun Yao, Tianxu Han, Haiyun Guo, Yuheng Jia, Junfeng Fang

PDF

Open Access

TL;DR

This paper introduces a novel plug-and-play method using sparse autoencoders to identify and steer latent directions in LVLMs, effectively reducing hallucinations with minimal computational overhead.

Contribution

The work proposes SSL, a new approach leveraging SAE-derived latent directions to mitigate hallucinations in LVLMs, outperforming existing methods.

Findings

01

SSL significantly reduces hallucinations in LVLMs.

02

The method maintains transferability across different model architectures.

03

SSL incurs negligible additional computational cost.

Abstract

Large vision-language models (LVLMs) have achieved remarkable performance on multimodal tasks. However, they still suffer from hallucinations, generating text inconsistent with visual input, posing significant risks in real-world applications. Existing approaches to address this issue focus on incorporating external knowledge bases, alignment training, or decoding strategies, all of which require substantial computational cost and time. Recent works try to explore more efficient alternatives by adjusting LVLMs' internal representations. Although promising, these methods may cause hallucinations to be insufficiently suppressed or lead to excessive interventions that negatively affect normal semantics. In this work, we leverage sparse autoencoders (SAEs) to identify semantic directions closely associated with faithfulness or hallucination, extracting more precise and disentangled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Schizophrenia research and treatment

MethodsFocus