Combining Transformer Generators with Convolutional Discriminators
Ricard Durall, Stanislav Frolov, J\"orn Hees, Federico Raue,, Franz-Josef Pfreundt, Andreas Dengel, Janis Keupe

TL;DR
This paper explores a hybrid GAN architecture combining transformer-based generators with convolutional discriminators, removing the need for additional training tricks, and demonstrates improved image synthesis quality and spectral properties.
Contribution
It introduces a novel hybrid model that integrates transformers and CNN discriminators, improving image synthesis without extra training complexities.
Findings
Hybrid model outperforms pure transformer GANs in quality
Combining architectures retains attention benefits in generated images
Removing auxiliary tasks simplifies training process
Abstract
Transformer models have recently attracted much interest from computer vision researchers and have since been successfully employed for several problems traditionally addressed with convolutional neural networks. At the same time, image synthesis using generative adversarial networks (GANs) has drastically improved over the last few years. The recently proposed TransGAN is the first GAN using only transformer-based architectures and achieves competitive results when compared to convolutional GANs. However, since transformers are data-hungry architectures, TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism. In this paper, we study the combination of a transformer-based generator and convolutional discriminator and successfully remove the need of the aforementioned required design choices. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
