Combining Transformer Generators with Convolutional Discriminators

Ricard Durall; Stanislav Frolov; J\"orn Hees; Federico Raue,; Franz-Josef Pfreundt; Andreas Dengel; Janis Keupe

arXiv:2105.10189·cs.CV·July 13, 2021

Combining Transformer Generators with Convolutional Discriminators

Ricard Durall, Stanislav Frolov, J\"orn Hees, Federico Raue,, Franz-Josef Pfreundt, Andreas Dengel, Janis Keupe

PDF

TL;DR

This paper explores a hybrid GAN architecture combining transformer-based generators with convolutional discriminators, removing the need for additional training tricks, and demonstrates improved image synthesis quality and spectral properties.

Contribution

It introduces a novel hybrid model that integrates transformers and CNN discriminators, improving image synthesis without extra training complexities.

Findings

01

Hybrid model outperforms pure transformer GANs in quality

02

Combining architectures retains attention benefits in generated images

03

Removing auxiliary tasks simplifies training process

Abstract

Transformer models have recently attracted much interest from computer vision researchers and have since been successfully employed for several problems traditionally addressed with convolutional neural networks. At the same time, image synthesis using generative adversarial networks (GANs) has drastically improved over the last few years. The recently proposed TransGAN is the first GAN using only transformer-based architectures and achieves competitive results when compared to convolutional GANs. However, since transformers are data-hungry architectures, TransGAN requires data augmentation, an auxiliary super-resolution task during training, and a masking prior to guide the self-attention mechanism. In this paper, we study the combination of a transformer-based generator and convolutional discriminator and successfully remove the need of the aforementioned required design choices. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.