Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via   Language-Contrastive Decoding (LCD)

Avshalom Manevich; Reut Tsarfaty

arXiv:2408.04664·cs.CL·August 12, 2024

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)

Avshalom Manevich, Reut Tsarfaty

PDF

Open Access 1 Video

TL;DR

This paper introduces Language Contrastive Decoding (LCD), a novel method to reduce object hallucinations in Large Vision-Language Models by adjusting output confidence levels, leading to improved accuracy and caption quality.

Contribution

The study proposes LCD, a new decoding algorithm that effectively mitigates hallucinations in LVLMs without retraining or complex post-processing, applicable across different models.

Findings

01

Up to 4% improvement in POPE F1 scores

02

Up to 36% reduction in CHAIR scores

03

Enhanced captioning quality scores

Abstract

Large Vision-Language Models (LVLMs) are an extension of Large Language Models (LLMs) that facilitate processing both image and text inputs, expanding AI capabilities. However, LVLMs struggle with object hallucinations due to their reliance on text cues and learned object co-occurrence biases. While most research quantifies these hallucinations, mitigation strategies are still lacking. Our study introduces a Language Contrastive Decoding (LCD) algorithm that adjusts LVLM outputs based on LLM distribution confidence levels, effectively reducing object hallucinations. We demonstrate the advantages of LCD in leading LVLMs, showing up to %4 improvement in POPE F1 scores and up to %36 reduction in CHAIR scores on the COCO validation set, while also improving captioning quality scores. Our method effectively improves LVLMs without needing complex post-processing or retraining, and is easily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)· underline

Taxonomy

TopicsBrain Tumor Detection and Classification · COVID-19 diagnosis using AI