CODE: Contrasting Self-generated Description to Combat Hallucination in   Large Multi-modal Models

Junho Kim; Hyunjun Kim; Yeonju Kim; Yong Man Ro

arXiv:2406.01920·cs.CV·June 5, 2024·2 cites

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models

Junho Kim, Hyunjun Kim, Yeonju Kim, Yong Man Ro

PDF

Open Access 1 Video

TL;DR

This paper introduces CODE, a contrastive decoding method that uses self-generated descriptions to reduce hallucinations and improve response accuracy in large multi-modal models, without requiring extra training.

Contribution

The paper presents a novel contrastive decoding approach that leverages self-generated descriptions to enhance coherence and reduce hallucinations in LMMs during inference.

Findings

01

Significantly reduces hallucinations in LMM outputs

02

Improves cross-modal consistency across benchmarks

03

Can be integrated into existing models without retraining

Abstract

Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COuntering DEscription Contrastive Decoding (CODE), which leverages self-generated descriptions as contrasting references during the decoding phase of LMMs to address hallucination issues. CODE utilizes the comprehensive descriptions from model itself as visual counterpart to correct and improve response alignment with actual visual content. By dynamically adjusting the information flow and distribution of next-token predictions in the LMM's vocabulary, CODE enhances the coherence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models· slideslive

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Cell Image Analysis Techniques