Vit-GAN: Image-to-image Translation with Vision Transformes and   Conditional GANS

Yi\u{g}it G\"und\"u\c{c}

arXiv:2110.09305·eess.IV·October 19, 2021

Vit-GAN: Image-to-image Translation with Vision Transformes and Conditional GANS

Yi\u{g}it G\"und\"u\c{c}

PDF

1 Repo

TL;DR

Vit-GAN introduces a versatile image-to-image translation architecture combining vision transformers and conditional GANs, achieving more realistic results across various tasks like segmentation and depth perception.

Contribution

The paper presents a novel vision transformer-based generator integrated with conditional GANs for improved image translation quality.

Findings

01

More realistic image translation results.

02

Effective across multiple image-to-image translation tasks.

03

Enhanced adversarial architecture performance.

Abstract

In this paper, we have developed a general-purpose architecture, Vit-Gan, capable of performing most of the image-to-image translation tasks from semantic image segmentation to single image depth perception. This paper is a follow-up paper, an extension of generator-based model [1] in which the obtained results were very promising. This opened the possibility of further improvements with adversarial architecture. We used a unique vision transformers-based generator architecture and Conditional GANs(cGANs) with a Markovian Discriminator (PatchGAN) (https://github.com/YigitGunduc/vit-gan). In the present work, we use images as conditioning arguments. It is observed that the obtained results are more realistic than the commonly used architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yigitgunduc/vit-gan
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.