Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models
Weijue Bu, Guan Yuan, Guixian Zhang

TL;DR
Conscious Gaze (CG-VLM) is a training-free, inference-time framework that improves vision-language models by adaptively reorienting attention to reduce object hallucinations through game-theoretic interpretability and targeted attention control.
Contribution
It introduces a novel, training-free inference method that uses game-theoretic interpretability to dynamically correct attention drift in vision-language models.
Findings
Achieves state-of-the-art results on POPE and CHAIR benchmarks.
Effectively reduces object hallucinations without degrading model capabilities.
Provides a principled, context-aware attention correction mechanism.
Abstract
Large Vision-Language Models (VLMs) often exhibit text inertia, where attention drifts from visual evidence toward linguistic priors, resulting in object hallucinations. Existing decoding strategies intervene only at the output logits and thus cannot correct internal reasoning drift, while recent internal-control methods based on heuristic head suppression or global steering vectors lack principled grounding. We introduce Conscious Gaze (CG-VLM), a training-free, inference-time framework that converts game-theoretic interpretability into actionable decoding control. A Cognitive Demand Sensor built on Harsanyi interactions estimates instantaneous vision-text synergy and identifies moments when visual grounding is necessary. Conditioned on this signal, a Focused Consensus Induction module selectively reorients mid-layer attention toward visual tokens before collapse into text priors.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
