Evaluating Hallucination in Text-to-Image Diffusion Models with   Scene-Graph based Question-Answering Agent

Ziyuan Qin; Dongjie Cheng; Haoyu Wang; Huahui Yi; Yuting Shao; Zhiyuan; Fan; Kang Li; Qicheng Lao

arXiv:2412.05722·cs.CV·December 10, 2024

Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent

Ziyuan Qin, Dongjie Cheng, Haoyu Wang, Huahui Yi, Yuting Shao, Zhiyuan, Fan, Kang Li, Qicheng Lao

PDF

Open Access

TL;DR

This paper introduces a novel automatic evaluation method for text-to-image models using scene-graph based question-answering with large language models, aiming to detect hallucinations and align more closely with human judgments.

Contribution

It proposes a scene-graph based question-answering approach combined with LLMs for more accurate T2I evaluation, and provides a new dataset with human scores for validation.

Findings

01

Method aligns better with human scoring than existing metrics

02

Generated a dataset of 12,000 images with human ratings

03

Demonstrates effectiveness in detecting hallucinations

Abstract

Contemporary Text-to-Image (T2I) models frequently depend on qualitative human evaluations to assess the consistency between synthesized images and the text prompts. There is a demand for quantitative and automatic evaluation tools, given that human evaluation lacks reproducibility. We believe that an effective T2I evaluation metric should accomplish the following: detect instances where the generated images do not align with the textual prompts, a discrepancy we define as the `hallucination problem' in T2I tasks; record the types and frequency of hallucination issues, aiding users in understanding the causes of errors; and provide a comprehensive and intuitive scoring that close to human standard. To achieve these objectives, we propose a method based on large language models (LLMs) for conducting question-answering with an extracted scene-graph and created a dataset with human-rated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Brain Tumor Detection and Classification · Machine Learning in Healthcare

MethodsALIGN