Latent Denoising Diffusion GAN: Faster sampling, Higher image quality

Luan Thanh Trinh; Tomoki Hamagami

arXiv:2406.11713·cs.CV·June 18, 2024

Latent Denoising Diffusion GAN: Faster sampling, Higher image quality

Luan Thanh Trinh, Tomoki Hamagami

PDF

1 Repo

TL;DR

This paper introduces the Latent Denoising Diffusion GAN, a model that significantly improves inference speed and image quality by operating in a compressed latent space and employing a weighted learning strategy, outperforming previous diffusion models.

Contribution

It presents a novel latent space approach combined with weighted learning to enhance diffusion model efficiency and output quality, surpassing prior methods like DiffusionGAN and Wavelet Diffusion.

Findings

01

Achieves state-of-the-art speed among diffusion models.

02

Shows significant improvements in image quality metrics.

03

Demonstrates effectiveness across multiple datasets.

Abstract

Diffusion models are emerging as powerful solutions for generating high-fidelity and diverse images, often surpassing GANs under many circumstances. However, their slow inference speed hinders their potential for real-time applications. To address this, DiffusionGAN leveraged a conditional GAN to drastically reduce the denoising steps and speed up inference. Its advancement, Wavelet Diffusion, further accelerated the process by converting data into wavelet space, thus enhancing efficiency. Nonetheless, these models still fall short of GANs in terms of speed and image quality. To bridge these gaps, this paper introduces the Latent Denoising Diffusion GAN, which employs pre-trained autoencoders to compress images into a compact latent space, significantly improving inference speed and image quality. Furthermore, we propose a Weighted Learning strategy to enhance diversity and image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thanhluantrinh/lddgan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion