Scaling up GANs for Text-to-Image Synthesis

Minguk Kang; Jun-Yan Zhu; Richard Zhang; Jaesik Park; Eli Shechtman,; Sylvain Paris; Taesung Park

arXiv:2303.05511·cs.CV·June 21, 2023·5 cites

Scaling up GANs for Text-to-Image Synthesis

Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman,, Sylvain Paris, Taesung Park

PDF

Open Access 1 Repo

TL;DR

GigaGAN is a new large-scale GAN architecture that enables fast, high-resolution text-to-image synthesis and supports advanced editing, challenging the dominance of diffusion and auto-regressive models.

Contribution

The paper introduces GigaGAN, a scalable GAN architecture that overcomes previous stability issues, achieving high-resolution, rapid image synthesis suitable for large datasets.

Findings

01

GigaGAN synthesizes 512px images in 0.13 seconds.

02

It produces 16-megapixel images in 3.66 seconds.

03

Supports various latent space editing techniques.

Abstract

The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucidrains/gigagan-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction

MethodsConvolution · Dense Connections · HuMan(Expedia)||How do I get a human at Expedia? · Adaptive Instance Normalization · Diffusion · R1 Regularization · Feedforward Network · StyleGAN