CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim, Hyunjun Kim, Yeonju Kim, Yong Man Ro

TL;DR
This paper introduces CODE, a contrastive decoding method that uses self-generated descriptions to reduce hallucinations and improve response accuracy in large multi-modal models, without requiring extra training.
Contribution
The paper presents a novel contrastive decoding approach that leverages self-generated descriptions to enhance coherence and reduce hallucinations in LMMs during inference.
Findings
Significantly reduces hallucinations in LMM outputs
Improves cross-modal consistency across benchmarks
Can be integrated into existing models without retraining
Abstract
Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COuntering DEscription Contrastive Decoding (CODE), which leverages self-generated descriptions as contrasting references during the decoding phase of LMMs to address hallucination issues. CODE utilizes the comprehensive descriptions from model itself as visual counterpart to correct and improve response alignment with actual visual content. By dynamically adjusting the information flow and distribution of next-token predictions in the LMM's vocabulary, CODE enhances the coherence and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Cell Image Analysis Techniques
