ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models

Zifu Wan; Ce Zhang; Silong Yong; Martin Q. Ma; Simon Stepputtis; Louis-Philippe Morency; Deva Ramanan; Katia Sycara; Yaqi Xie

arXiv:2507.00898·cs.CV·July 2, 2025

ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models

Zifu Wan, Ce Zhang, Silong Yong, Martin Q. Ma, Simon Stepputtis, Louis-Philippe Morency, Deva Ramanan, Katia Sycara, Yaqi Xie

PDF

Open Access

TL;DR

ONLY is a training-free, one-layer intervention method that significantly reduces hallucinations in large vision-language models, enabling efficient real-time responses without multiple queries.

Contribution

The paper introduces ONLY, a novel single-query, one-layer intervention technique that effectively mitigates hallucinations in LVLMs without additional training or multiple queries.

Findings

01

OUTPERFORMS state-of-the-art methods across benchmarks

02

Requires only a single query and minimal computational resources

03

Enhances textual output by amplifying crucial information

Abstract

Recent Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses. Although they have achieved remarkable performance across a range of multi-modal tasks, they face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications. Existing work has explored contrastive decoding approaches to mitigate this issue, where the output of the original LVLM is compared and contrasted with that of a perturbed version. However, these methods require two or more queries that slow down LVLM response generation, making them less suitable for real-time applications. To overcome this limitation, we propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis