Image Transformer
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, {\L}ukasz Kaiser, Noam, Shazeer, Alexander Ku, Dustin Tran

TL;DR
This paper introduces an image generation model based on the Transformer architecture with local self-attention, achieving state-of-the-art results on ImageNet and effective super-resolution capabilities.
Contribution
It adapts the Transformer model for image generation with local self-attention, enabling larger images and outperforming previous methods in likelihood and human perception.
Findings
Outperforms previous state-of-the-art in image likelihood on ImageNet
Achieves superior results in image super-resolution tasks
Generates images that fool humans more effectively than prior models
Abstract
Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By restricting the self-attention mechanism to attend to local neighborhoods we significantly increase the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks. While conceptually simple, our generative models significantly outperform the current state of the art in image generation on ImageNet, improving the best published negative log-likelihood on ImageNet from 3.83 to 3.77. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computational Physics and Python Applications · Cell Image Analysis Techniques
