Colorization Transformer
Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

TL;DR
The Colorization Transformer introduces a self-attention based method for diverse, high-fidelity image colorization, outperforming previous state-of-the-art techniques in both quantitative and human evaluations.
Contribution
It proposes a novel three-step transformer-based architecture for image colorization that produces diverse and high-quality color outputs, advancing the state-of-the-art.
Findings
Outperforms previous methods in FID scores
Achieves higher human preference rates over ground truth in many cases
Produces diverse colorizations from grayscale images
Abstract
We present the Colorization Transformer, a novel approach for diverse high fidelity image colorization based on self-attention. Given a grayscale image, the colorization proceeds in three steps. We first use a conditional autoregressive transformer to produce a low resolution coarse coloring of the grayscale image. Our architecture adopts conditional transformer layers to effectively condition grayscale input. Two subsequent fully parallel networks upsample the coarse colored low resolution image into a finely colored high resolution image. Sampling from the Colorization Transformer produces diverse colorings whose fidelity outperforms the previous state-of-the-art on colorising ImageNet based on FID results and based on a human evaluation in a Mechanical Turk test. Remarkably, in more than 60% of cases human evaluators prefer the highest rated among three generated colorings over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Image Enhancement Techniques
MethodsAbsolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Axial Attention · Colorization Transformer · Colorization · Attention Is All You Need · Dense Connections · Byte Pair Encoding · Softmax
