Curing Semantic Drift: A Dynamic Approach to Grounding Generation in Large Vision-Language Models

Jiahe Chen; Jiaying He; Qiyuan Chen; Qian Shao; Jiahe Ying; Hongxia Xu; Jintai Chen; Jianwei Zheng; Jian Wu

arXiv:2506.21509·cs.CV·March 17, 2026

Curing Semantic Drift: A Dynamic Approach to Grounding Generation in Large Vision-Language Models

Jiahe Chen, Jiaying He, Qiyuan Chen, Qian Shao, Jiahe Ying, Hongxia Xu, Jintai Chen, Jianwei Zheng, Jian Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces DLC, a training-free decoding framework that dynamically calibrates token logits in large vision-language models to reduce semantic drift and hallucinations, improving grounding fidelity without sacrificing response quality.

Contribution

The paper proposes DLC, a novel visual calibration method that intervenes during decoding to improve grounding in LVLMs, addressing semantic drift without additional training.

Findings

01

DLC reduces hallucinations across multiple LVLMs.

02

It maintains response quality while improving grounding fidelity.

03

The method is robust to different vision backbones.

Abstract

Large Vision-Language Models (LVLMs) face a tug-of-war between powerful linguistic priors and visual evidence, often leading to \emph{semantic drift}: a progressive detachment from the input image that can abruptly emerge at specific decoding steps. Through a token-level diagnosis, we show that hallucination is frequently triggered not by the absence of grounded candidates, but by a failure of selection -- the model chooses a linguistically convenient yet visually unfaithful token even when better grounded alternatives exist. Motivated by this insight, we propose \textbf{D}ynamic \textbf{L}ogits \textbf{C}alibration (DLC), a training-free decoding framework that introduces a lightweight visual referee to intervene exactly when drift happens. At each step, DLC performs a dual-aspect grounding check on top- $k$ candidates: (1) it assesses the intrinsic visual relevance of a candidate token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiahechen2002/dlc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Functional Brain Connectivity Studies · Cell Image Analysis Techniques

MethodsContrastive Language-Image Pre-training · ALIGN