TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf,, Ranjay Krishna, Noah A Smith

TL;DR
TIFA introduces a novel, reference-free evaluation metric for text-to-image models that uses question answering to assess how faithfully generated images match their textual descriptions, correlating well with human judgments.
Contribution
The paper presents TIFA, a new automatic, interpretable, and fine-grained evaluation method for text-to-image models based on visual question answering, along with a comprehensive benchmark dataset.
Findings
Current models excel in color and material but struggle with counting and spatial relations.
TIFA correlates better with human judgments than existing metrics.
The benchmark reveals specific limitations in current text-to-image synthesis models.
Abstract
Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image. TIFA is a reference-free metric that allows for fine-grained and interpretable evaluations of generated images. TIFA also has better correlations with human judgments than existing metrics. Based on this approach, we introduce TIFA v1.0,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
Methodsfail · ALIGN
