Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual   and Semantic Credit Assignment

Qi Chen; Chaorui Deng; Zixiong Huang; Bowen Zhang; Mingkui Tan; Qi Wu

arXiv:2308.08525·cs.CV·August 17, 2023

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

Qi Chen, Chaorui Deng, Zixiong Huang, Bowen Zhang, Mingkui Tan, Qi Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a likelihood-based evaluation metric for text-to-image synthesis that assesses perceptual quality and semantic alignment more accurately and efficiently than traditional metrics, using patch-level credit assignment.

Contribution

It proposes a novel likelihood-based evaluation method with patch-level credit assignment to improve assessment of generated images' quality and alignment.

Findings

01

The proposed metric correlates well with human judgment.

02

It requires fewer samples for reliable evaluation.

03

It outperforms traditional metrics like Inception Score and FID.

Abstract

Text-to-image synthesis has made encouraging progress and attracted lots of public attention recently. However, popular evaluation metrics in this area, like the Inception Score and Fr'echet Inception Distance, incur several issues. First of all, they cannot explicitly assess the perceptual quality of generated images and poorly reflect the semantic alignment of each text-image pair. Also, they are inefficient and need to sample thousands of images to stabilise their evaluation results. In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i.e., a higher likelihood indicates better perceptual quality and better text-image alignment. To prevent the likelihood of being dominated by the non-crucial part of the generated image, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenqi008/leica
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Advanced Optical Imaging Technologies