VLEU: a Method for Automatic Evaluation for Generalizability of   Text-to-Image Models

Jingtao Cao; Zheng Zhang; Hongru Wang; Kam-Fai Wong

arXiv:2409.14704·cs.CV·November 18, 2024

VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models

Jingtao Cao, Zheng Zhang, Hongru Wang, Kam-Fai Wong

PDF

Open Access 1 Repo 1 Video

TL;DR

VLEU is a novel evaluation metric for Text-to-Image models that measures their ability to handle diverse prompts by analyzing the distribution of generated images relative to input texts using large language models and CLIP.

Contribution

We introduce VLEU, a new metric leveraging large language models and CLIP to assess the generalizability of T2I models across diverse prompts, filling a gap in existing evaluation methods.

Findings

01

VLEU effectively measures T2I model generalization.

02

VLEU correlates well with model finetuning improvements.

03

VLEU distinguishes different T2I models based on prompt diversity.

Abstract

Progress in Text-to-Image (T2I) models has significantly improved the generation of images from textual descriptions. However, existing evaluation metrics do not adequately assess the models' ability to handle a diverse range of textual prompts, which is crucial for their generalizability. To address this, we introduce a new metric called Visual Language Evaluation Understudy (VLEU). VLEU uses large language models to sample from the visual text domain, the set of all possible input texts for T2I models, to generate a wide variety of prompts. The images generated from these prompts are evaluated based on their alignment with the input text using the CLIP model.VLEU quantifies a model's generalizability by computing the Kullback-Leibler divergence between the marginal distribution of the visual text and the conditional distribution of the images generated by the model. This metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mio7690/VLEU
pytorchOfficial

Videos

VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models· underline

Taxonomy

TopicsMathematics, Computing, and Information Processing

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training