DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from   Low-Dimensional Latents

Kushagra Pandey; Avideep Mukherjee; Piyush Rai; Abhishek Kumar

arXiv:2201.00308·cs.LG·November 30, 2022·45 cites

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

Kushagra Pandey, Avideep Mukherjee, Piyush Rai, Abhishek Kumar

PDF

Open Access 3 Repos

TL;DR

DiffuseVAE combines the strengths of VAEs and diffusion models to enable efficient, high-fidelity, and controllable image synthesis with low-dimensional latent representations, improving speed and quality over existing methods.

Contribution

It introduces a novel framework integrating VAE within diffusion models, enabling low-dimensional latent inference and improved speed-quality tradeoffs for image generation.

Findings

01

Achieves state-of-the-art synthesis quality on benchmarks like CIFAR-10 and CelebA-64.

02

Reduces generation time significantly compared to standard diffusion models.

03

Provides controllable image synthesis using low-dimensional latent codes.

Abstract

Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, standard Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework, and leverage this to design novel conditional parameterizations for diffusion models. We show that the resulting model equips diffusion models with a low-dimensional VAE inferred latent code which can be used for downstream tasks like controllable synthesis. The proposed method also improves upon the speed vs quality tradeoff exhibited in standard unconditional DDPM/DDIM models (for instance, FID of 16.47 vs 34.36 using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Materials Science · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion