Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai, Sun, Rongrong Ji

TL;DR
This paper introduces R-Bench, a new benchmark for evaluating relationship hallucinations in vision-language models, revealing their reliance on common sense and difficulty with spatial reasoning.
Contribution
The paper presents R-Bench, a comprehensive benchmark for assessing relationship hallucinations in LVLMs, and analyzes the causes of these hallucinations, including dataset biases and model limitations.
Findings
LVLMs often hallucinate relationships due to dataset biases.
Current LVLMs rely heavily on common sense over visual content.
Models struggle with spatial reasoning in visual relationships.
Abstract
The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Bench, a novel benchmark for evaluating Vision Relationship Hallucination. R-Bench features image-level questions that focus on the existence of relationships and instance-level questions that assess local visual comprehension. We identify three types of relationship co-occurrences that lead to hallucinations: relationship-relationship, subject-relationship, and relationship-object. The visual instruction tuning dataset's long-tail distribution significantly impacts LVLMs' understanding of visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStroke Rehabilitation and Recovery
MethodsFocus
