Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering

Guangtao Lyu; Xinyi Cheng; Qi Liu; Chenghao Xu; Jiexi Yan; Muli Yang; Fen Fang; Cheng Deng

arXiv:2602.00621·cs.CV·February 3, 2026

Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering

Guangtao Lyu, Xinyi Cheng, Qi Liu, Chenghao Xu, Jiexi Yan, Muli Yang, Fen Fang, Cheng Deng

PDF

Open Access

TL;DR

This paper introduces Contrastive Neuron Steering, a novel method that analyzes and mitigates hallucinations in LVLMs by manipulating interpretable neurons, leading to improved visual grounding and reduced hallucinations.

Contribution

It presents a representation-level approach using sparse autoencoders and contrastive analysis to identify and control image-specific neurons, enhancing robustness and interpretability in LVLMs.

Findings

01

CNS reduces hallucinations across benchmarks

02

Selective neuron modulation improves visual grounding

03

Method is compatible with existing decoding techniques

Abstract

LVLMs achieve remarkable multimodal understanding and generation but remain susceptible to hallucinations. Existing mitigation methods predominantly focus on output-level adjustments, leaving the internal mechanisms that give rise to these hallucinations largely unexplored. To gain a deeper understanding, we adopt a representation-level perspective by introducing sparse autoencoders (SAEs) to decompose dense visual embeddings into sparse, interpretable neurons. Through neuron-level analysis, we identify distinct neuron types, including always-on neurons and image-specific neurons. Our findings reveal that hallucinations often result from disruptions or spurious activations of image-specific neurons, while always-on neurons remain largely stable. Moreover, selectively enhancing or suppressing image-specific neurons enables controllable intervention in LVLM outputs, improving visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHallucinations in medical conditions · Advanced Image Processing Techniques · Face Recognition and Perception