CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
Yixuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang, Li

TL;DR
CLIP2GAN is a novel framework that leverages CLIP and StyleGAN to enable flexible text-guided image generation and attribute editing by bridging CLIP's feature space with StyleGAN's latent space.
Contribution
We introduce CLIP2GAN, a new method that maps CLIP features to StyleGAN's latent space, allowing effective text-to-image synthesis and attribute editing.
Findings
Outperforms previous methods in text-guided image generation.
Enables attribute editing by manipulating mapped text features.
Uses self-supervised learning to optimize the mapping network.
Abstract
In this work, we are dedicated to text-guided image generation and propose a novel framework, i.e., CLIP2GAN, by leveraging CLIP model and StyleGAN. The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network. In the training stage, we encode an image with CLIP and map the output feature to a latent code, which is further used to reconstruct the image. In this way, the mapping network is optimized in a self-supervised learning way. In the inference stage, since CLIP can embed both image and text into a shared feature embedding space, we replace CLIP image encoder in the training architecture with CLIP text encoder, while keeping the following mapping network as well as StyleGAN model. As a result, we can flexibly input a text description to generate an image. Moreover, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
MethodsHuMan(Expedia)||How do I get a human at Expedia? · StyleGAN · Dense Connections · Convolution · Feedforward Network · R1 Regularization · Adaptive Instance Normalization · Contrastive Language-Image Pre-training
