FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN   Space Optimization

Xingchao Liu; Chengyue Gong; Lemeng Wu; Shujian Zhang; Hao Su; Qiang; Liu

arXiv:2112.01573·cs.CV·January 2, 2022·36 cites

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization

Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, Qiang, Liu

PDF

Open Access 1 Repo

TL;DR

FuseDream enhances zero-shot text-to-image generation by optimizing in GAN latent space with robust CLIP scoring, novel initialization, and image composition techniques, achieving high-quality results without training.

Contribution

It introduces FuseDream, a training-free pipeline that improves CLIP+GAN image generation through advanced optimization, augmentation, and composition strategies.

Findings

01

Achieves top Inception and FID scores on MS COCO.

02

Generates diverse, high-quality images from text prompts.

03

Extends GAN capabilities with novel composition and optimization methods.

Abstract

Generating images from natural language instructions is an intriguing yet highly challenging task. We approach text-to-image generation by combining the power of the retrained CLIP representation with an off-the-shelf image generator (GANs), optimizing in the latent space of GAN to find images that achieve maximum CLIP score with the given input text. Compared to traditional methods that train generative models from text to image starting from scratch, the CLIP+GAN approach is training-free, zero shot and can be easily customized with different generators. However, optimizing CLIP score in the GAN space casts a highly challenging optimization problem and off-the-shelf optimizers such as Adam fail to yield satisfying results. In this work, we propose a FuseDream pipeline, which improves the CLIP+GAN approach with three key techniques: 1) an AugCLIP score which robustifies the CLIP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gnobitab/fusedream
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training