A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A   Study with Unified Text-to-Image Fidelity Metrics

Xiangru Zhu; Penglei Sun; Chengyu Wang; Jingping Liu; Zhixu Li,; Yanghua Xiao; Jun Huang

arXiv:2312.02338·cs.CV·December 12, 2023·1 cites

A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics

Xiangru Zhu, Penglei Sun, Chengyu Wang, Jingping Liu, Zhixu Li,, Yanghua Xiao, Jun Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Winoground-T2I, a comprehensive benchmark with contrastive sentence pairs to evaluate and analyze the compositionality and fidelity of text-to-image synthesis models, addressing evaluation inconsistencies.

Contribution

The paper presents a new benchmark for assessing T2I models' compositionality and proposes a strategy for evaluating metric reliability across complex sentence pairs.

Findings

01

Identifies strengths and weaknesses of current T2I models

02

Highlights inconsistencies in existing evaluation metrics

03

Provides a publicly available benchmark for future research

Abstract

Text-to-image (T2I) synthesis has recently achieved significant advancements. However, challenges remain in the model's compositionality, which is the ability to create new combinations from known components. We introduce Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I models. This benchmark includes 11K complex, high-quality contrastive sentence pairs spanning 20 categories. These contrastive sentence pairs with subtle differences enable fine-grained evaluations of T2I synthesis models. Additionally, to address the inconsistency across different metrics, we propose a strategy that evaluates the reliability of various metrics by using comparative sentence pairs. We use Winoground-T2I with a dual objective: to evaluate the performance of T2I models and the metrics used for their evaluation. Finally, we provide insights into the strengths and weaknesses of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhuxiangru/winoground-t2i
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis