Evaluating and Analyzing Relationship Hallucinations in Large   Vision-Language Models

Mingrui Wu; Jiayi Ji; Oucheng Huang; Jiale Li; Yuhang Wu; Xiaoshuai; Sun; Rongrong Ji

arXiv:2406.16449·cs.CV·July 19, 2024

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai, Sun, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces R-Bench, a new benchmark for evaluating relationship hallucinations in vision-language models, revealing their reliance on common sense and difficulty with spatial reasoning.

Contribution

The paper presents R-Bench, a comprehensive benchmark for assessing relationship hallucinations in LVLMs, and analyzes the causes of these hallucinations, including dataset biases and model limitations.

Findings

01

LVLMs often hallucinate relationships due to dataset biases.

02

Current LVLMs rely heavily on common sense over visual content.

03

Models struggle with spatial reasoning in visual relationships.

Abstract

The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Bench, a novel benchmark for evaluating Vision Relationship Hallucination. R-Bench features image-level questions that focus on the existence of relationships and instance-level questions that assess local visual comprehension. We identify three types of relationship co-occurrences that lead to hallucinations: relationship-relationship, subject-relationship, and relationship-object. The visual instruction tuning dataset's long-tail distribution significantly impacts LVLMs' understanding of visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mrwu-mac/R-Bench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStroke Rehabilitation and Recovery

MethodsFocus