PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset
Jiazhen Liu, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong, Lian, Zhanhui Kang, and Xirong Li

TL;DR
This paper introduces PhD, a large-scale dataset created using ChatGPT to evaluate visual hallucinations in multimodal large language models across various tasks and contexts, revealing variability in hallucination susceptibility.
Contribution
The paper presents a novel, comprehensive dataset for objective visual hallucination evaluation of MLLMs, utilizing a ChatGPT-assisted pipeline to generate diverse, task-specific, and context-varied hallucination assessment data.
Findings
MLLMs show significant variability in hallucination across tasks and modes.
The dataset includes over 102k VQA triplets and 14k images.
PhD enables detailed analysis of hallucination patterns in MLLMs.
Abstract
Multimodal Large Language Models (MLLMs) hallucinate, resulting in an emerging topic of visual hallucination evaluation (VHE). This paper contributes a ChatGPT-Prompted visual hallucination evaluation Dataset (PhD) for objective VHE at a large scale. The essence of VHE is to ask an MLLM questions about specific images to assess its susceptibility to hallucination. Depending on what to ask (objects, attributes, sentiment, etc.) and how the questions are asked, we structure PhD along two dimensions, i.e. task and mode. Five visual recognition tasks, ranging from low-level (object / attribute recognition) to middle-level (sentiment / position recognition and counting), are considered. Besides a normal visual QA mode, which we term PhD-base, PhD also asks questions with specious context (PhD-sec) or with incorrect context ({PhD-icc), or with AI-generated counter common sense images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHallucinations in medical conditions · Leprosy Research and Treatment
MethodsFocus
