SSD: Towards Better Text-Image Consistency Metric in Text-to-Image   Generation

Zhaorui Tan; Xi Yang; Zihan Ye; Qiufeng Wang; Yuyao Yan; Anh Nguyen,; Kaizhu Huang

arXiv:2210.15235·cs.CV·December 6, 2022

SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation

Zhaorui Tan, Xi Yang, Zihan Ye, Qiufeng Wang, Yuyao Yan, Anh Nguyen,, Kaizhu Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SSD, a new CLIP-based metric for better measuring text-image consistency in generation, and proposes PDF-GAN, a model that improves semantic alignment between text and images.

Contribution

The paper develops a novel distributionally-founded CLIP-based metric SSD and a new GAN architecture PDF-GAN with plug-and-play components for enhanced text-image semantic consistency.

Findings

01

SSD correlates better with human judgment of consistency

02

PDF-GAN achieves superior text-image alignment on benchmark datasets

03

Proposed methods outperform existing state-of-the-art in consistency metrics

Abstract

Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric $R$ -precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance ( $S S D$ ), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaorui-tan/pdf-gan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques