Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models
Jialin Wu, Wei Shi, Han Shen, Peigui Qi, Kunsheng Tang, Zhicong Huang, Binghao Wang, Zhou Yang

TL;DR
REVIS is a training-free method that reduces object hallucination in large vision-language models by sparsely re-activating visual features at specific network depths, improving accuracy with minimal extra computation.
Contribution
It introduces a novel, training-free framework that explicitly restores visual information in LVLMs through sparse interventions based on latent space geometry.
Findings
REVIS reduces object hallucination rates by approximately 19%.
The method preserves the models' general reasoning capabilities.
REVIS operates with minimal additional computational cost.
Abstract
Despite the advanced capabilities of Large Vision-Language Models (LVLMs), they frequently suffer from object hallucination. One reason is that visual features and pretrained textual representations often become intertwined in the deeper network layers. To address this, we propose REVIS, a training-free framework designed to explicitly re-activate this suppressed visual information. Rooted in latent space geometry, REVIS extracts the pure visual information vector via orthogonal projection and employs a calibrated strategy to perform sparse intervention only at the precise depth where suppression occurs. This surgical approach effectively restores visual information with minimal computational cost. Empirical evaluations on standard benchmarks demonstrate that REVIS reduces object hallucination rates by approximately 19% compared to state-of-the-art baselines, while preserving general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
