Curing Semantic Drift: A Dynamic Approach to Grounding Generation in Large Vision-Language Models
Jiahe Chen, Jiaying He, Qiyuan Chen, Qian Shao, Jiahe Ying, Hongxia Xu, Jintai Chen, Jianwei Zheng, Jian Wu

TL;DR
This paper introduces DLC, a training-free decoding framework that dynamically calibrates token logits in large vision-language models to reduce semantic drift and hallucinations, improving grounding fidelity without sacrificing response quality.
Contribution
The paper proposes DLC, a novel visual calibration method that intervenes during decoding to improve grounding in LVLMs, addressing semantic drift without additional training.
Findings
DLC reduces hallucinations across multiple LVLMs.
It maintains response quality while improving grounding fidelity.
The method is robust to different vision backbones.
Abstract
Large Vision-Language Models (LVLMs) face a tug-of-war between powerful linguistic priors and visual evidence, often leading to \emph{semantic drift}: a progressive detachment from the input image that can abruptly emerge at specific decoding steps. Through a token-level diagnosis, we show that hallucination is frequently triggered not by the absence of grounded candidates, but by a failure of selection -- the model chooses a linguistically convenient yet visually unfaithful token even when better grounded alternatives exist. Motivated by this insight, we propose \textbf{D}ynamic \textbf{L}ogits \textbf{C}alibration (DLC), a training-free decoding framework that introduces a lightweight visual referee to intervene exactly when drift happens. At each step, DLC performs a dual-aspect grounding check on top- candidates: (1) it assesses the intrinsic visual relevance of a candidate token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Functional Brain Connectivity Studies · Cell Image Analysis Techniques
MethodsContrastive Language-Image Pre-training · ALIGN
