Analyzing and Mitigating Object Hallucination in Large Vision-Language   Models

Yiyang Zhou; Chenhang Cui; Jaehong Yoon; Linjun Zhang; Zhun Deng,; Chelsea Finn; Mohit Bansal; Huaxiu Yao

arXiv:2310.00754·cs.LG·March 19, 2024·29 cites

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng,, Chelsea Finn, Mohit Bansal, Huaxiu Yao

PDF

Open Access 1 Repo

TL;DR

This paper introduces LURE, a post-hoc algorithm that significantly reduces object hallucination in large vision-language models by analyzing key factors like co-occurrence and uncertainty, improving description accuracy.

Contribution

The paper presents LURE, a novel, model-agnostic method for rectifying object hallucination in LVLMs through statistical analysis and reconstruction techniques.

Findings

01

LURE achieves 23% improvement in hallucination metrics.

02

LURE outperforms previous approaches in GPT and human evaluations.

03

LURE is seamlessly integrable with any LVLM.

Abstract

Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. However, LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. This can negatively impact many vision-language tasks, such as visual summarization and reasoning. To address this issue, we propose a simple yet powerful algorithm, LVLM Hallucination Revisor (LURE), to post-hoc rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions. LURE is grounded in a rigorous statistical analysis of the key factors underlying object hallucination, including co-occurrence (the frequent appearance of certain objects alongside others in images), uncertainty (objects with higher uncertainty during LVLM decoding), and object position (hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiyangzhou/lure
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Residual Connection · Linear Warmup With Cosine Annealing · Layer Normalization