PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset

Jiazhen Liu; Yuhan Fu; Ruobing Xie; Runquan Xie; Xingwu Sun; Fengzong; Lian; Zhanhui Kang; and Xirong Li

arXiv:2403.11116·cs.CV·April 15, 2025·3 cites

PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset

Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong, Lian, Zhanhui Kang, and Xirong Li

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces PhD, a large-scale dataset created using ChatGPT to evaluate visual hallucinations in multimodal large language models across various tasks and contexts, revealing variability in hallucination susceptibility.

Contribution

The paper presents a novel, comprehensive dataset for objective visual hallucination evaluation of MLLMs, utilizing a ChatGPT-assisted pipeline to generate diverse, task-specific, and context-varied hallucination assessment data.

Findings

01

MLLMs show significant variability in hallucination across tasks and modes.

02

The dataset includes over 102k VQA triplets and 14k images.

03

PhD enables detailed analysis of hallucination patterns in MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) hallucinate, resulting in an emerging topic of visual hallucination evaluation (VHE). This paper contributes a ChatGPT-Prompted visual hallucination evaluation Dataset (PhD) for objective VHE at a large scale. The essence of VHE is to ask an MLLM questions about specific images to assess its susceptibility to hallucination. Depending on what to ask (objects, attributes, sentiment, etc.) and how the questions are asked, we structure PhD along two dimensions, i.e. task and mode. Five visual recognition tasks, ranging from low-level (object / attribute recognition) to middle-level (sentiment / position recognition and counting), are considered. Besides a normal visual QA mode, which we term PhD-base, PhD also asks questions with specious context (PhD-sec) or with incorrect context ({PhD-icc), or with AI-generated counter common sense images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiazhen-code/PhD
noneOfficial

Datasets

AIMClab-RUC/PhD
dataset· 542 dl
542 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHallucinations in medical conditions · Leprosy Research and Treatment

MethodsFocus