Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng,, Chelsea Finn, Mohit Bansal, Huaxiu Yao

TL;DR
This paper introduces LURE, a post-hoc algorithm that significantly reduces object hallucination in large vision-language models by analyzing key factors like co-occurrence and uncertainty, improving description accuracy.
Contribution
The paper presents LURE, a novel, model-agnostic method for rectifying object hallucination in LVLMs through statistical analysis and reconstruction techniques.
Findings
LURE achieves 23% improvement in hallucination metrics.
LURE outperforms previous approaches in GPT and human evaluations.
LURE is seamlessly integrable with any LVLM.
Abstract
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. However, LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. This can negatively impact many vision-language tasks, such as visual summarization and reasoning. To address this issue, we propose a simple yet powerful algorithm, LVLM Hallucination Revisor (LURE), to post-hoc rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions. LURE is grounded in a rigorous statistical analysis of the key factors underlying object hallucination, including co-occurrence (the frequent appearance of certain objects alongside others in images), uncertainty (objects with higher uncertainty during LVLM decoding), and object position (hallucination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Residual Connection · Linear Warmup With Cosine Annealing · Layer Normalization
