NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image   Generation

Xiaohui Zeng; Raquel Urtasun; Richard Zemel; Sanja Fidler; Renjie Liao

arXiv:2106.13435·cs.CV·July 7, 2021

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation

Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao

PDF

Open Access 1 Repo

TL;DR

NP-DRAW introduces a non-parametric, part-by-part image generation model using a latent canvas and Transformer dependency modeling, achieving superior performance and interpretability over previous structured models.

Contribution

The paper presents a novel non-parametric prior, Transformer-based dependency modeling, and a heuristic parsing algorithm for improved structured image generation.

Findings

01

Outperforms previous models like DRAW and AIR on multiple datasets.

02

Achieves competitive results with other generative models.

03

Enhances low-data learning and latent space interpretability.

Abstract

In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows. 1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per step becomes a categorical random variable. This improves the expressiveness and greatly eases the learning compared to Gaussians used in the literature. 2) We model the sequential dependency structure of parts via a Transformer, which is more powerful and easier to train compared to RNNs used in the literature. 3) We propose an effective heuristic parsing algorithm to pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show that our method significantly outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZENGXH/NPDRAW
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis

MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Byte Pair Encoding · Dropout · Layer Normalization · Adam · Label Smoothing