NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation
Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, Renjie Liao

TL;DR
NP-DRAW introduces a non-parametric, part-by-part image generation model using a latent canvas and Transformer dependency modeling, achieving superior performance and interpretability over previous structured models.
Contribution
The paper presents a novel non-parametric prior, Transformer-based dependency modeling, and a heuristic parsing algorithm for improved structured image generation.
Findings
Outperforms previous models like DRAW and AIR on multiple datasets.
Achieves competitive results with other generative models.
Enhances low-data learning and latent space interpretability.
Abstract
In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows. 1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per step becomes a categorical random variable. This improves the expressiveness and greatly eases the learning compared to Gaussians used in the literature. 2) We model the sequential dependency structure of parts via a Transformer, which is more powerful and easier to train compared to RNNs used in the literature. 3) We propose an effective heuristic parsing algorithm to pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show that our method significantly outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Byte Pair Encoding · Dropout · Layer Normalization · Adam · Label Smoothing
