Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent
Ziyuan Qin, Dongjie Cheng, Haoyu Wang, Huahui Yi, Yuting Shao, Zhiyuan, Fan, Kang Li, Qicheng Lao

TL;DR
This paper introduces a novel automatic evaluation method for text-to-image models using scene-graph based question-answering with large language models, aiming to detect hallucinations and align more closely with human judgments.
Contribution
It proposes a scene-graph based question-answering approach combined with LLMs for more accurate T2I evaluation, and provides a new dataset with human scores for validation.
Findings
Method aligns better with human scoring than existing metrics
Generated a dataset of 12,000 images with human ratings
Demonstrates effectiveness in detecting hallucinations
Abstract
Contemporary Text-to-Image (T2I) models frequently depend on qualitative human evaluations to assess the consistency between synthesized images and the text prompts. There is a demand for quantitative and automatic evaluation tools, given that human evaluation lacks reproducibility. We believe that an effective T2I evaluation metric should accomplish the following: detect instances where the generated images do not align with the textual prompts, a discrepancy we define as the `hallucination problem' in T2I tasks; record the types and frequency of hallucination issues, aiding users in understanding the causes of errors; and provide a comprehensive and intuitive scoring that close to human standard. To achieve these objectives, we propose a method based on large language models (LLMs) for conducting question-answering with an extracted scene-graph and created a dataset with human-rated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Brain Tumor Detection and Classification · Machine Learning in Healthcare
MethodsALIGN
