Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding
Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li

TL;DR
This paper introduces a training-free method called IMCCD to reduce hallucinations in large vision-language models by calibrating inter-modality correlations, improving the consistency between visual inputs and generated content.
Contribution
The paper proposes a novel inter-modality correlation calibration decoding method with modules for contrastive decoding and attention refinement, addressing hallucinations from spurious correlations in LVLMs.
Findings
Outperforms existing methods in hallucination reduction benchmarks
Effectively mitigates overreliance on language priors and misleading correlations
Enhances focus on important visual content during decoding
Abstract
Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding for downstream multi-modal tasks. Despite their success, LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content. To address this issue, some approaches have introduced inference-time interventions, such as contrastive decoding and attention rectification, to reduce overreliance on language priors. However, these approaches overlook hallucinations stemming from spurious inter-modality correlations. In this paper, we propose an Inter-Modality Correlation Calibration Decoding (IMCCD) method to mitigate hallucinations in LVLMs in a training-free manner. In this method, we design a Cross-Modal Value-Enhanced Decoding(CMVED) module to alleviate hallucination by a novel contrastive decoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Image Retrieval and Classification Techniques · Fractal and DNA sequence analysis
MethodsSoftmax · Attention Is All You Need · Focus
