A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing, Guiming Hardy Chen, Ehsan Aghazadeh, Xin Eric Wang,, Xinya Du

TL;DR
This paper investigates the causes of visual object hallucination in large vision-language models, analyzing model components and proposing mitigation strategies, supported by new benchmarks for evaluating hallucination issues.
Contribution
It provides a comprehensive analysis of hallucination sources in LVLMs and introduces two benchmarks to evaluate and mitigate these hallucinations.
Findings
Identified key sources of hallucination in model components
Proposed targeted mitigation methods for each component
Developed two benchmarks for hallucination evaluation
Abstract
Large Vision-Language Models (LVLMs) demonstrate remarkable capabilities in multimodal tasks, but visual object hallucination remains a persistent issue. It refers to scenarios where models generate inaccurate visual object-related information based on the query input, potentially leading to misinformation and concerns about safety and reliability. Previous works focus on the evaluation and mitigation of visual hallucinations, but the underlying causes have not been comprehensively investigated. In this paper, we analyze each component of LLaVA-like LVLMs -- the large language model, the vision backbone, and the projector -- to identify potential sources of error and their impact. Based on our observations, we propose methods to mitigate hallucination for each problematic component. Additionally, we developed two hallucination benchmarks: QA-VisualGenome, which emphasizes attribute and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Brain Tumor Detection and Classification · Retinal Imaging and Analysis
MethodsFocus
