VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
Jiaqi Wang, Yifei Gao, Jitao Sang

TL;DR
VaLiD introduces a visual layer fusion contrastive decoding approach that mitigates hallucinations in large vision-language models by correcting visual encoding distortions, significantly improving the accuracy of generated content.
Contribution
The paper presents a novel visual encoding perspective and a contrastive decoding method to effectively reduce hallucinations in LVLMs, outperforming existing inference-time mitigation techniques.
Findings
VaLiD reduces hallucinations across multiple benchmarks.
It achieves state-of-the-art performance compared to baseline methods.
Visual layer fusion improves the reliability of model outputs.
Abstract
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal task reasoning. However, they often generate responses that appear plausible yet do not accurately reflect the visual content, a phenomenon known as hallucination. Recent approaches have introduced training-free methods to mitigate hallucinations by adjusting the decoding strategy during the inference stage, typically attributing hallucinations to the language model itself. Our analysis, however, reveals that distortions in the visual encoding process significantly affect the model's reasoning capabilities. Specifically, earlier visual layers may retain key features but gradually distort as the information propagates toward the output layer. Building on these insights, we propose a novel hallucination-mitigation method from the visual encoding perspective: \textbf{V}isu\textbf{a}l \textbf{L}ayer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · COVID-19 diagnosis using AI · Image Processing Techniques and Applications
