Evaluating Image Hallucination in Text-to-Image Generation with   Question-Answering

Youngsun Lim; Hojun Choi; and Hyunjung Shim

arXiv:2409.12784·cs.CV·February 11, 2025

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering

Youngsun Lim, Hojun Choi, and Hyunjung Shim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces I-HallA, an automated evaluation metric and benchmark dataset for assessing factual accuracy in text-to-image models using visual question answering, revealing current models often hallucinate facts.

Contribution

The paper presents I-HallA, a novel metric and dataset for evaluating factual correctness in text-to-image generation through VQA-based assessment.

Findings

01

State-of-the-art TTI models often hallucinate factual content.

02

I-HallA metric shows a high correlation (ρ=0.95) with human judgments.

03

The dataset includes 1.2K image-text pairs with 1,000 curated questions.

Abstract

Despite the impressive success of text-to-image (TTI) generation models, existing studies overlook the issue of whether these models accurately convey factual information. In this paper, we focus on the problem of image hallucination, where images created by generation models fail to faithfully depict factual content. To address this, we introduce I-HallA (Image Hallucination evaluation with Question Answering), a novel automated evaluation metric that measures the factuality of generated images through visual question answering (VQA). We also introduce I-HallA v1.0, a curated benchmark dataset for this purpose. As part of this process, we develop a pipeline that generates high-quality question-answer pairs using multiple GPT-4 Omni-based agents, with human judgments to ensure accuracy. Our evaluation protocols measure image hallucination by testing if images from existing TTI models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hchoi256/i-halla-v1.0
noneOfficial

Videos

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering· underline

Taxonomy

TopicsCell Image Analysis Techniques · Image Processing Techniques and Applications · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections