StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale   Text-to-Image Synthesis

Axel Sauer; Tero Karras; Samuli Laine; Andreas Geiger; Timo Aila

arXiv:2301.09515·cs.LG·January 24, 2023·61 cites

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila

PDF

Open Access 1 Repo

TL;DR

StyleGAN-T is a novel GAN-based model that achieves fast, high-quality large-scale text-to-image synthesis, outperforming previous GANs and diffusion models in both quality and speed.

Contribution

The paper introduces StyleGAN-T, a GAN architecture tailored for large-scale text-to-image synthesis, addressing stability, capacity, and alignment challenges to compete with diffusion models.

Findings

01

StyleGAN-T outperforms previous GANs in quality and speed.

02

StyleGAN-T surpasses distilled diffusion models in sample quality.

03

The model demonstrates stable training on diverse datasets.

Abstract

Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

autonomousvision/stylegan-t
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsDogecoin Customer Service Number +1-833-534-1729