Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng, Hang Zhang, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan, Miao, Lidong Bing

TL;DR
This paper introduces Visual Contrastive Decoding (VCD), a training-free method that reduces object hallucinations in large vision-language models by contrasting outputs from original and distorted images, improving factual accuracy.
Contribution
The paper presents VCD, a novel, training-free approach that effectively mitigates object hallucinations in LVLMs by leveraging visual contrastive decoding without external tools.
Findings
VCD significantly reduces object hallucinations across various LVLMs.
VCD improves accuracy on general LVLM benchmarks.
VCD does not require additional training or external tools.
Abstract
Large Vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
