Lossy Image Compression with Conditional Diffusion Models
Ruihan Yang, Stephan Mandt

TL;DR
This paper introduces a novel lossy image compression method using conditional diffusion models, outperforming GAN-based approaches and achieving competitive results with VAEs, with enhanced perceptual quality and efficient decoding.
Contribution
It presents a diffusion-based decoder for image compression that improves perceptual quality and decoding efficiency over existing neural compression methods.
Findings
Stronger FID scores than GAN-based models.
Competitive performance with VAE-based models on distortion metrics.
High-quality reconstructions with few decoding steps.
Abstract
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional ``content'' latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining ``texture'' variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Data Compression Techniques
