The Nuts and Bolts of Adopting Transformer in GANs
Rui Xu, Xiangyu Xu, Kai Chen, Bolei Zhou, Chen Change Loy

TL;DR
This paper empirically investigates the integration of Transformer architectures into GANs for high-fidelity image synthesis, revealing insights on feature locality and residual connections, and proposing a CNN-free generator called STrans-G.
Contribution
It provides a comprehensive empirical analysis of Transformer properties in GANs and introduces a novel CNN-free generator architecture, STrans-G, with competitive performance.
Findings
Feature locality remains important in image generation.
Residual connections in self-attention layers can be harmful for training.
The proposed STrans-G achieves competitive results in image synthesis.
Abstract
Transformer becomes prevalent in computer vision, especially for high-level vision tasks. However, adopting Transformer in the generative adversarial network (GAN) framework is still an open yet challenging problem. In this paper, we conduct a comprehensive empirical study to investigate the properties of Transformer in GAN for high-fidelity image synthesis. Our analysis highlights and reaffirms the importance of feature locality in image generation, although the merits of the locality are well known in the classification task. Perhaps more interestingly, we find the residual connections in self-attention layers harmful for learning Transformer-based discriminators and conditional generators. We carefully examine the influence and propose effective ways to mitigate the negative impacts. Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Processing Techniques and Applications · Cell Image Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Adam · Dropout
