Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework

Shiyu Liu; Xinyi Wen; Zhibin Lan; Ante Wang; Jinsong Su

arXiv:2601.22451·cs.CV·April 9, 2026

Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework

Shiyu Liu, Xinyi Wen, Zhibin Lan, Ante Wang, Jinsong Su

PDF

TL;DR

This paper introduces a training-free self-validation framework that significantly reduces object hallucination in LVLMs by verifying object existence, improving caption accuracy without additional training.

Contribution

It presents a novel self-validation approach that mitigates object hallucination in LVLMs without requiring further training, addressing over-reliance on language priors.

Findings

01

Achieved 65.6% improvement on CHAIRI metric with LLaVA-v1.5-7B.

02

Demonstrated the effectiveness of self-validation in reducing hallucinations.

03

Outperformed previous state-of-the-art methods in object hallucination mitigation.

Abstract

Despite progress in Large Vision Language Models (LVLMs), object hallucination remains a critical issue in image captioning task, where models generate descriptions of non-existent objects, compromising their reliability. Previous work attributes this to LVLMs' over-reliance on language priors and attempts to mitigate it through logits calibration. However, they still lack a thorough analysis of the over-reliance. To gain a deeper understanding of over-reliance, we conduct a series of preliminary experiments, indicating that as the generation length increases, LVLMs' over-reliance on language priors leads to inflated probability of hallucinated object tokens, consequently exacerbating object hallucination. To circumvent this issue, we propose Language-Prior-Free Verification to enable LVLMs to faithfully verify the confidence of object existence. Based on this, we propose a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.