MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Chenxi Wang, Xiang Chen, Ningyu Zhang, Bozhong Tian, Haoming Xu,, Shumin Deng, Huajun Chen

TL;DR
This paper investigates why multimodal large language models hallucinate and introduces DeCo, a dynamic correction decoding method that adaptively integrates visual recognition to significantly reduce hallucinations.
Contribution
The paper presents a novel, model-agnostic decoding strategy, DeCo, that mitigates hallucinations in MLLMs by dynamically integrating visual recognition information during output generation.
Findings
DeCo reduces hallucination rates significantly on benchmark datasets.
MLLMs recognize visual objects in earlier layers despite hallucinating in final outputs.
DeCo can be integrated with various decoding strategies and models.
Abstract
Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena, but the underlying reasons remain poorly understood. In this paper, we present an empirical analysis and find that, although MLLMs incorrectly generate the objects in the final output, they are actually able to recognize visual objects in the preceding layers. We speculate that this may be due to the strong knowledge priors of the language model suppressing the visual information, leading to hallucinations. Motivated by this, we propose a novel dynamic correction decoding method for MLLMs DeCo, which adaptively selects the appropriate preceding layers and proportionally integrates knowledge into the final layer to adjust the output logits. Note that DeCo is model agnostic and can be seamlessly incorporated with various classic decoding strategies and applied to different MLLMs. We evaluate DeCo on…
Peer Reviews
Decision·ICLR 2025 Poster
- The authors demonstrate through probing experiments that MLLMs can recognize objects in earlier layers but tend to “forget” this information due to language model priors in deeper layers, leading to hallucinations. This insight offers a novel layer-wise perspective on the hallucination mechanism in MLLMs. - The figures illustrating token probabilities across transformer layers effectively highlight the trends for hallucinated versus non-hallucinated tokens, making the analysis accessible and i
- In Figure 9, the response includes awkward repetition, with "The horse statue is positioned on top of the chair" stated multiple times. This raises questions about the effectiveness of the chosen α\alphaα value in avoiding repetitive language, as the authors indicated that high \alpha values could increase repetition. - In Figure 10, DeCo reduces a significant hallucination (misidentifying a lift as a "chair"), but the output still contains a hallucination about "several other people visible i
1. The motivation seems interesting. 2. The paper is well written and easy to follow. The diagrams are essential to understanding this paper. 3. This paper achieves good results on existing datasets. 4. The main technical pipeline is clear.
1. Although the experiments indicate improved performance in preceding layers, I am concerned about the coherence and richness of the text generated at these stages. Could you provide further evaluation metrics for text quality, such as BLEU or other relevant scores? 2. In Figure 1(b), the interval [10, 20] appears optimal, yet in Figure 7(b), [17, 28] shows better performance. Could you clarify this discrepancy? 3. Could you provide more evidence to demonstrate how dynamic soft modulation preve
- This work makes an interesting observation of how visual information exists in intermediate layers, and then overridden by knowledge prior closer to the output - The proposed mitigation method is lightweight and efficient. - The experimental results are in general better than baselines.
Although the presentation has a focus about image-conditioned generative language model, the methodology for finding 1 and 2, as well as the proposed layer selection and probability correction, are modality agnostic. The findings are mostly empirical, and it's unclear whether this is a general phenomenum for other models in the same size, nor for models in other sizes. There has been quite a few literature in studying LLM's internal presentation and hallucination, only selectively listing a fe
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHallucinations in medical conditions
