KVSmooth: Mitigating Hallucination in Multi-modal Large Language Models through Key-Value Smoothing
Siyu Jiang, Feiyang Chen, Xiaojin Zhang, and Kun He

TL;DR
KVSmooth is a training-free, plug-and-play method that reduces hallucinations in multimodal large language models by adaptively smoothing attention states based on entropy, improving factual consistency without retraining.
Contribution
We introduce KVSmooth, a novel inference-time technique that mitigates hallucination in MLLMs through attention-entropy-guided smoothing, without requiring retraining or model modification.
Findings
Significantly reduces hallucination scores (CHAIR) from 41.8 to 18.2.
Improves F1 score from 77.5 to 79.2, enhancing accuracy and recall.
Operates efficiently during inference without additional training.
Abstract
Despite the significant progress of Multimodal Large Language Models (MLLMs) across diverse tasks, hallucination -- corresponding to the generation of visually inconsistent objects, attributes, or relations -- remains a major obstacle to their reliable deployment. Unlike pure language models, MLLMs must ground their generation process in visual inputs. However, existing models often suffer from semantic drift during decoding, causing outputs to diverge from visual facts as the sequence length increases. To address this issue, we propose KVSmooth, a training-free and plug-and-play method that mitigates hallucination by performing attention-entropy-guided adaptive smoothing on hidden states. Specifically, KVSmooth applies an exponential moving average (EMA) to both keys and values in the KV-Cache, while dynamically quantifying the sink degree of each token through the entropy of its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
