Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
Youngsun Lim, Hojun Choi, and Hyunjung Shim

TL;DR
This paper introduces I-HallA, an automated evaluation metric and benchmark dataset for assessing factual accuracy in text-to-image models using visual question answering, revealing current models often hallucinate facts.
Contribution
The paper presents I-HallA, a novel metric and dataset for evaluating factual correctness in text-to-image generation through VQA-based assessment.
Findings
State-of-the-art TTI models often hallucinate factual content.
I-HallA metric shows a high correlation (ρ=0.95) with human judgments.
The dataset includes 1.2K image-text pairs with 1,000 curated questions.
Abstract
Despite the impressive success of text-to-image (TTI) generation models, existing studies overlook the issue of whether these models accurately convey factual information. In this paper, we focus on the problem of image hallucination, where images created by generation models fail to faithfully depict factual content. To address this, we introduce I-HallA (Image Hallucination evaluation with Question Answering), a novel automated evaluation metric that measures the factuality of generated images through visual question answering (VQA). We also introduce I-HallA v1.0, a curated benchmark dataset for this purpose. As part of this process, we develop a pipeline that generates high-quality question-answer pairs using multiple GPT-4 Omni-based agents, with human judgments to ensure accuracy. Our evaluation protocols measure image hallucination by testing if images from existing TTI models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCell Image Analysis Techniques · Image Processing Techniques and Applications · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections
