Delve into Visual Contrastive Decoding for Hallucination Mitigation of   Large Vision-Language Models

Yi-Lun Lee; Yi-Hsuan Tsai; Wei-Chen Chiu

arXiv:2412.06775·cs.CV·December 10, 2024

Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models

Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu

PDF

Open Access 1 Repo

TL;DR

This paper investigates how visual contrastive decoding can reduce hallucinations in large vision-language models by analyzing various methods of altering visual inputs and combining contrastive samples, leading to improved hallucination mitigation.

Contribution

It introduces a novel analysis of visual contrastive decoding methods, including image editing and downsampling, and proposes a practical fusion approach to enhance hallucination mitigation across models and benchmarks.

Findings

01

Contrastive samples' effectiveness varies across models and benchmarks.

02

Combining contrastive samples improves hallucination mitigation.

03

Proposed fusion method enhances performance in multiple scenarios.

Abstract

While large vision-language models (LVLMs) have shown impressive capabilities in generating plausible responses correlated with input visual contents, they still suffer from hallucinations, where the generated text inaccurately reflects visual contents. To address this, recent approaches apply contrastive decoding to calibrate the model's response via contrasting output distributions with original and visually distorted samples, demonstrating promising hallucination mitigation in a training-free manner. However, the potential of changing information in visual inputs is not well-explored, so a deeper investigation into the behaviors of visual contrastive decoding is of great interest. In this paper, we first explore various methods for contrastive decoding to change visual contents, including image downsampling and editing. Downsampling images reduces the detailed textual information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yilunlee/vcd_analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHallucinations in medical conditions