Context-Aware Decoding for Faithful Vision-Language Generation

Mehrdad Fazli; Bowen Wei; Ziwei Zhu

arXiv:2601.05939·cs.CV·January 12, 2026

Context-Aware Decoding for Faithful Vision-Language Generation

Mehrdad Fazli, Bowen Wei, Ziwei Zhu

PDF

Open Access

TL;DR

This paper investigates the layer-wise generation process in vision-language models to understand hallucinations and introduces a training-free method, CEI, that reduces hallucinations by using context embeddings as grounding signals during decoding.

Contribution

The paper uncovers the commitment-depth gap in LVLMs and proposes CEI, a lightweight, training-free technique that significantly reduces hallucinations in vision-language generation.

Findings

01

CEI outperforms state-of-the-art baselines on multiple benchmarks.

02

Dynamic CEI achieves the lowest hallucination rates.

03

Layer-wise analysis reveals early commitment of truthful tokens.

Abstract

Hallucinations, generating responses inconsistent with the visual input, remain a critical limitation of large vision-language models (LVLMs), especially in open-ended tasks such as image captioning and visual reasoning. In this work, we probe the layer-wise generation dynamics that drive hallucinations and propose a training-free mitigation strategy. Employing the Logit Lens, we examine how LVLMs construct next-token distributions across decoder layers, uncovering a pronounced commitment-depth gap: truthful tokens accumulate probability mass on their final candidates earlier than hallucinatory ones. Drawing on this discovery, we introduce Context Embedding Injection (CEI), a lightweight method that harnesses the hidden state of the last input token-the context embedding-as a grounding signal to maintain visual fidelity throughout decoding and curb hallucinations. Evaluated on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning