StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila

TL;DR
StyleGAN-T is a novel GAN-based model that achieves fast, high-quality large-scale text-to-image synthesis, outperforming previous GANs and diffusion models in both quality and speed.
Contribution
The paper introduces StyleGAN-T, a GAN architecture tailored for large-scale text-to-image synthesis, addressing stability, capacity, and alignment challenges to compete with diffusion models.
Findings
StyleGAN-T outperforms previous GANs in quality and speed.
StyleGAN-T surpasses distilled diffusion models in sample quality.
The model demonstrates stable training on diverse datasets.
Abstract
Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDogecoin Customer Service Number +1-833-534-1729
