Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering
Guangtao Lyu, Xinyi Cheng, Qi Liu, Chenghao Xu, Jiexi Yan, Muli Yang, Fen Fang, Cheng Deng

TL;DR
This paper introduces Contrastive Neuron Steering, a novel method that analyzes and mitigates hallucinations in LVLMs by manipulating interpretable neurons, leading to improved visual grounding and reduced hallucinations.
Contribution
It presents a representation-level approach using sparse autoencoders and contrastive analysis to identify and control image-specific neurons, enhancing robustness and interpretability in LVLMs.
Findings
CNS reduces hallucinations across benchmarks
Selective neuron modulation improves visual grounding
Method is compatible with existing decoding techniques
Abstract
LVLMs achieve remarkable multimodal understanding and generation but remain susceptible to hallucinations. Existing mitigation methods predominantly focus on output-level adjustments, leaving the internal mechanisms that give rise to these hallucinations largely unexplored. To gain a deeper understanding, we adopt a representation-level perspective by introducing sparse autoencoders (SAEs) to decompose dense visual embeddings into sparse, interpretable neurons. Through neuron-level analysis, we identify distinct neuron types, including always-on neurons and image-specific neurons. Our findings reveal that hallucinations often result from disruptions or spurious activations of image-specific neurons, while always-on neurons remain largely stable. Moreover, selectively enhancing or suppressing image-specific neurons enables controllable intervention in LVLM outputs, improving visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHallucinations in medical conditions · Advanced Image Processing Techniques · Face Recognition and Perception
