GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Ming Tao; Bing-Kun Bao; Hao Tang; Changsheng Xu

arXiv:2301.12959·cs.CV·January 31, 2023·6 cites

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu

PDF

Open Access 2 Repos

TL;DR

GALIP introduces a novel generative adversarial framework leveraging CLIP for efficient, controllable, and high-quality text-to-image synthesis, significantly reducing data and parameter requirements while increasing speed.

Contribution

The paper presents GALIP, a CLIP-based GAN model that improves efficiency, control, and speed in text-to-image synthesis compared to existing large models.

Findings

01

Requires only 3% of training data of large models

02

Achieves 120 times faster image synthesis

03

Maintains high image quality and controllability

Abstract

Synthesizing high-fidelity complex images from text is challenging. Based on large pretraining, the autoregressive and diffusion models can synthesize photo-realistic images. Although these large models have shown notable progress, there remain three flaws. 1) These models require tremendous training data and parameters to achieve good performance. 2) The multi-step generation design slows the image synthesis process heavily. 3) The synthesized visual features are difficult to control and require delicately designed prompts. To enable high-quality, efficient, fast, and controllable text-to-image synthesis, we propose Generative Adversarial CLIPs, namely GALIP. GALIP leverages the powerful pretrained CLIP model both in the discriminator and generator. Specifically, we propose a CLIP-based discriminator. The complex scene understanding ability of CLIP enables the discriminator to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion · Contrastive Language-Image Pre-training