Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework
Shiyu Liu, Xinyi Wen, Zhibin Lan, Ante Wang, Jinsong Su

TL;DR
This paper introduces a training-free self-validation framework that significantly reduces object hallucination in LVLMs by verifying object existence, improving caption accuracy without additional training.
Contribution
It presents a novel self-validation approach that mitigates object hallucination in LVLMs without requiring further training, addressing over-reliance on language priors.
Findings
Achieved 65.6% improvement on CHAIRI metric with LLaVA-v1.5-7B.
Demonstrated the effectiveness of self-validation in reducing hallucinations.
Outperformed previous state-of-the-art methods in object hallucination mitigation.
Abstract
Despite progress in Large Vision Language Models (LVLMs), object hallucination remains a critical issue in image captioning task, where models generate descriptions of non-existent objects, compromising their reliability. Previous work attributes this to LVLMs' over-reliance on language priors and attempts to mitigate it through logits calibration. However, they still lack a thorough analysis of the over-reliance. To gain a deeper understanding of over-reliance, we conduct a series of preliminary experiments, indicating that as the generation length increases, LVLMs' over-reliance on language priors leads to inflated probability of hallucinated object tokens, consequently exacerbating object hallucination. To circumvent this issue, we propose Language-Prior-Free Verification to enable LVLMs to faithfully verify the confidence of object existence. Based on this, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
