If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based   Text-to-Image Generation by Selection

Shyamgopal Karthik; Karsten Roth; Massimiliano Mancini; Zeynep Akata

arXiv:2305.13308·cs.CV·May 23, 2023·2 cites

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that large diffusion-based text-to-image models are inherently more faithful to prompts than previously thought and proposes a candidate selection approach to improve faithfulness efficiently.

Contribution

It introduces a simple candidate selection pipeline that enhances faithfulness in T2I models without retraining or complex modifications.

Findings

01

Outperforms post-hoc methods in faithfulness metrics

02

Achieves comparable or lower computational cost

03

Validated through quantitative and user studies

Abstract

Despite their impressive capabilities, diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt, where generated images may not contain all the mentioned objects, attributes or relations. To alleviate these issues, recent works proposed post-hoc methods to improve model faithfulness without costly retraining, by modifying how the model utilizes the input prompt. In this work, we take a step back and show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts without the need to manipulate the generative process. Based on that, we show how faithfulness can be simply treated as a candidate selection problem instead, and introduce a straightforward pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system that can leverage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

explainableml/imageselect
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization

MethodsDiffusion