PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture   Likelihood and Other Modifications

Tim Salimans; Andrej Karpathy; Xi Chen; Diederik P. Kingma

arXiv:1701.05517·cs.LG·January 24, 2017·75 cites

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma

PDF

Open Access 5 Repos

TL;DR

This paper enhances PixelCNN models by introducing a discretized logistic mixture likelihood, simplifying structure, and employing various modifications to improve training speed and performance, achieving state-of-the-art results on CIFAR-10.

Contribution

The paper presents several modifications to PixelCNN, including a new likelihood function, model simplifications, and training techniques, leading to improved efficiency and accuracy.

Findings

01

Achieved state-of-the-art log likelihood on CIFAR-10

02

Discretized logistic mixture likelihood speeds up training

03

Model modifications improve performance and training efficiency

Abstract

PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings