ConVis: Contrastive Decoding with Hallucination Visualization for   Mitigating Hallucinations in Multimodal Large Language Models

Yeji Park; Deokyeong Lee; Junsuk Choe; Buru Chang

arXiv:2408.13906·cs.CV·August 27, 2024

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang

PDF

Open Access 1 Repo

TL;DR

ConVis is a training-free contrastive decoding approach that uses image reconstruction to reduce hallucinations in multimodal large language models, improving their reliability without additional data or training.

Contribution

We propose ConVis, a novel decoding method that leverages image reconstruction for contrastive signals, effectively mitigating hallucinations in MLLMs without extra training.

Findings

01

Significantly reduces hallucinations across multiple benchmarks

02

Operates without additional data or model fine-tuning

03

Enhances the reliability of MLLMs in multimodal tasks

Abstract

Hallucinations in Multimodal Large Language Models (MLLMs) where generated responses fail to accurately reflect the given image pose a significant challenge to their reliability. To address this, we introduce ConVis, a novel training-free contrastive decoding method. ConVis leverages a text-to-image (T2I) generation model to semantically reconstruct the given image from hallucinated captions. By comparing the contrasting probability distributions produced by the original and reconstructed images, ConVis enables MLLMs to capture visual contrastive signals that penalize hallucination generation. Notably, this method operates purely within the decoding process, eliminating the need for additional data or model updates. Our extensive experiments on five popular benchmarks demonstrate that ConVis effectively reduces hallucinations across various MLLMs, highlighting its potential to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yejipark-m/convis
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Mental Health via Writing · Mental Health Research Topics