ViT-Inception-GAN for Image Colourising
Tejas Bana, Jatan Loya, Siddhant Kulkarni

TL;DR
This paper introduces ViT-I-GAN, a novel image colourisation model combining Vision Transformer, Inception-v3, and GANs, trained on large datasets to improve colourisation quality.
Contribution
It proposes a new hybrid architecture integrating ViT, Inception-v3, and GANs for more effective image colourisation.
Findings
Inception-v3 embedding improves colourisation results.
ViT-I-GAN outperforms models without Inception-v3.
The model is trained on large datasets like Unsplash and COCO.
Abstract
Studies involving colourising images has been garnering researchers' keen attention over time, assisted by significant advances in various Machine Learning techniques and compute power availability. Traditionally, colourising images have been an intricate task that gave a substantial degree of freedom during the assignment of chromatic information. In our proposed method, we attempt to colourise images using Vision Transformer - Inception - Generative Adversarial Network (ViT-I-GAN), which has an Inception-v3 fusion embedding in the generator. For a stable and robust network, we have used Vision Transformer (ViT) as the discriminator. We trained the model on the Unsplash and the COCO dataset for demonstrating the improvement made by the Inception-v3 embedding. We have compared the results between ViT-GANs with and without Inception-v3 embedding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · 1x1 Convolution · Auxiliary Classifier · Average Pooling · Inception-v3 Module · Byte Pair Encoding · Multi-Head Attention
