SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation
Zhaorui Tan, Xi Yang, Zihan Ye, Qiufeng Wang, Yuyao Yan, Anh Nguyen,, Kaizhu Huang

TL;DR
This paper introduces SSD, a new CLIP-based metric for better measuring text-image consistency in generation, and proposes PDF-GAN, a model that improves semantic alignment between text and images.
Contribution
The paper develops a novel distributionally-founded CLIP-based metric SSD and a new GAN architecture PDF-GAN with plug-and-play components for enhanced text-image semantic consistency.
Findings
SSD correlates better with human judgment of consistency
PDF-GAN achieves superior text-image alignment on benchmark datasets
Proposed methods outperform existing state-of-the-art in consistency metrics
Abstract
Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric -precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance (), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
