TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models

Georgia Gabriela Sampaio; Ruixiang Zhang; Shuangfei Zhai; Jiatao Gu,; Josh Susskind; Navdeep Jaitly; Yizhe Zhang

arXiv:2411.02437·cs.CV·November 6, 2024

TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models

Georgia Gabriela Sampaio, Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu,, Josh Susskind, Navdeep Jaitly, Yizhe Zhang

PDF

Open Access

TL;DR

TypeScore is a new evaluation metric that accurately measures a text-to-image model's ability to generate images with high-fidelity embedded text, providing finer discrimination than existing metrics like CLIPScore.

Contribution

This work introduces TypeScore, a novel metric that assesses the fidelity of embedded text in generated images, enhancing evaluation sensitivity for instruction-following capabilities.

Findings

01

TypeScore outperforms CLIPScore in differentiating models based on embedded text fidelity.

02

The metric effectively evaluates stylistic adherence in image generation.

03

Human studies validate the effectiveness of TypeScore as an evaluation tool.

Abstract

Evaluating text-to-image generative models remains a challenge, despite the remarkable progress being made in their overall performances. While existing metrics like CLIPScore work for coarse evaluations, they lack the sensitivity to distinguish finer differences as model performance rapidly improves. In this work, we focus on the text rendering aspect of these models, which provides a lens for evaluating a generative model's fine-grained instruction-following capabilities. To this end, we introduce a new evaluation framework called TypeScore to sensitively assess a model's ability to generate images with high-fidelity embedded text by following precise instructions. We argue that this text generation capability serves as a proxy for general instruction-following ability in image synthesis. TypeScore uses an additional image description model and leverages an ensemble dissimilarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization

MethodsFocus