TL;DR
This paper introduces LTS-FS, a layerwise feature steering framework guided by attribution scores to mitigate hallucinations in Large Vision-Language Models without degrading overall performance.
Contribution
It proposes a novel attribution-based, layerwise feature steering method that selectively targets hallucination-relevant layers in LVLMs.
Findings
LTS-FS significantly reduces hallucinations across multiple benchmarks.
The method preserves the performance of LVLMs on general tasks.
Attribution scores effectively identify hallucination-relevant layers.
Abstract
Despite the significant advancements in Large Vision-Language Models (LVLMs), their tendency to generate hallucinations undermines reliability and restricts broader practical deployment. Among the hallucination mitigation methods, feature steering emerges as a promising approach that reduces erroneous outputs in LVLMs without increasing inference costs. However, current methods apply uniform feature steering across all layers. This heuristic strategy ignores inter-layer differences, potentially disrupting layers unrelated to hallucinations and ultimately leading to performance degradation on general tasks. In this paper, we propose Locate-Then-Sparsify for Feature Steering (LTS-FS), a plug-and-play framework which controls the steering intensity according to the hallucination relevance of each layer. We first construct a dataset comprising token-level and sentence-level hallucination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hallucinations in medical conditions · Psychedelics and Drug Studies
