Distilling Diffusion Models into Conditional GANs

Minguk Kang; Richard Zhang; Connelly Barnes; Sylvain Paris; Suha Kwak,; Jaesik Park; Eli Shechtman; Jun-Yan Zhu; Taesung Park

arXiv:2405.05967·cs.CV·July 19, 2024·2 cites

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak,, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

PDF

Open Access

TL;DR

This paper introduces a method to convert complex diffusion models into fast, single-step conditional GANs by using a novel distillation process and a perceptual loss in latent space, achieving high-quality image generation.

Contribution

The authors present a new diffusion distillation technique as image-to-image translation and introduce E-LatentLPIPS, a perceptual loss in latent space, enabling efficient and effective model compression.

Findings

01

Outperforms existing one-step diffusion distillation models on COCO benchmark

02

E-LatentLPIPS converges faster than other perceptual losses

03

The method maintains high image quality with significantly reduced inference time

Abstract

We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Speech Recognition and Synthesis

MethodsDiffusion