Online Self-Calibration Against Hallucination in Vision-Language Models
Minghui Chen, Chenxu Yang, Hengjie Zhu, Dayan Wu, Zheng Lin, Qingyi Si

TL;DR
This paper introduces OSCAR, an online self-calibration framework for vision-language models that reduces hallucinations by leveraging a generative-discriminative gap and iterative refinement.
Contribution
It proposes a novel online self-calibration method using Monte Carlo Tree Search and a dual-granularity reward to improve hallucination mitigation in LVLMs.
Findings
OSCAR achieves state-of-the-art hallucination reduction on benchmarks.
The method improves general multimodal capabilities.
It effectively aligns model perception with visual input.
Abstract
Large Vision-Language Models (LVLMs) often suffer from hallucinations, generating descriptions that include visual details absent from the input image. Recent preference alignment methods typically rely on supervision distilled from stronger models such as GPT. However, this offline paradigm introduces a Supervision-Perception Mismatch: the student model is forced to align with fine-grained details beyond its perceptual capacity, learning to guess rather than to see. To obtain reliable self-supervision for online learning, we identify a Generative-Discriminative Gap within LVLMs, where models exhibit higher accuracy on discriminative verification than open-ended generation. Leveraging this capability, we propose \textbf{O}nline \textbf{S}elf-\textbf{CA}lib\textbf{R}ation (OSCAR), a framework that integrates Monte Carlo Tree Search with a Dual-Granularity Reward Mechanism to construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
