TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation   with Question Answering

Yushi Hu; Benlin Liu; Jungo Kasai; Yizhong Wang; Mari Ostendorf,; Ranjay Krishna; Noah A Smith

arXiv:2303.11897·cs.CV·August 21, 2023·5 cites

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf,, Ranjay Krishna, Noah A Smith

PDF

Open Access 1 Repo 2 Models

TL;DR

TIFA introduces a novel, reference-free evaluation metric for text-to-image models that uses question answering to assess how faithfully generated images match their textual descriptions, correlating well with human judgments.

Contribution

The paper presents TIFA, a new automatic, interpretable, and fine-grained evaluation method for text-to-image models based on visual question answering, along with a comprehensive benchmark dataset.

Findings

01

Current models excel in color and material but struggle with counting and spatial relations.

02

TIFA correlates better with human judgments than existing metrics.

03

The benchmark reveals specific limitations in current text-to-image synthesis models.

Abstract

Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image. TIFA is a reference-free metric that allows for fine-grained and interpretable evaluations of generated images. TIFA also has better correlations with human judgments than existing metrics. Based on this approach, we introduce TIFA v1.0,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yushi-Hu/tifa
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

Methodsfail · ALIGN