TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
Yifan Jiang, Shiyu Chang, Zhangyang Wang

TL;DR
TransGAN demonstrates that pure transformer architectures can effectively replace convolutional networks in GANs, achieving competitive high-resolution image generation without convolutions, and introduces techniques to stabilize training.
Contribution
This work pioneers the use of fully transformer-based GANs, proposing a novel architecture and training methods to enable high-quality image synthesis without convolutions.
Findings
TransGAN achieves state-of-the-art scores on STL-10.
It produces diverse, high-fidelity images at 256x256 resolution.
The model outperforms convolutional GANs like StyleGAN-V2.
Abstract
The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. While those attempts mainly study the discriminative models, we explore transformers on some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs). Our goal is to conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution, and correspondingly a multi-scale discriminator to capture simultaneously semantic contexts and low-level textures. On top of them, we introduce the new module of grid self-attention for alleviating the memory bottleneck…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Advanced Image Processing Techniques
