ViT-Inception-GAN for Image Colourising

Tejas Bana; Jatan Loya; Siddhant Kulkarni

arXiv:2106.06321·cs.CV·June 14, 2021

ViT-Inception-GAN for Image Colourising

Tejas Bana, Jatan Loya, Siddhant Kulkarni

PDF

Open Access

TL;DR

This paper introduces ViT-I-GAN, a novel image colourisation model combining Vision Transformer, Inception-v3, and GANs, trained on large datasets to improve colourisation quality.

Contribution

It proposes a new hybrid architecture integrating ViT, Inception-v3, and GANs for more effective image colourisation.

Findings

01

Inception-v3 embedding improves colourisation results.

02

ViT-I-GAN outperforms models without Inception-v3.

03

The model is trained on large datasets like Unsplash and COCO.

Abstract

Studies involving colourising images has been garnering researchers' keen attention over time, assisted by significant advances in various Machine Learning techniques and compute power availability. Traditionally, colourising images have been an intricate task that gave a substantial degree of freedom during the assignment of chromatic information. In our proposed method, we attempt to colourise images using Vision Transformer - Inception - Generative Adversarial Network (ViT-I-GAN), which has an Inception-v3 fusion embedding in the generator. For a stable and robust network, we have used Vision Transformer (ViT) as the discriminator. We trained the model on the Unsplash and the COCO dataset for demonstrating the improvement made by the Inception-v3 embedding. We have compared the results between ViT-GANs with and without Inception-v3 embedding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · 1x1 Convolution · Auxiliary Classifier · Average Pooling · Inception-v3 Module · Byte Pair Encoding · Multi-Head Attention