Progressive Compression with Universally Quantized Diffusion Models
Yibo Yang, Justus C. Will, Stephan Mandt

TL;DR
This paper introduces a novel diffusion-based compression method that uses uniform noise and universal quantization, enabling progressive image coding with competitive rate-distortion performance.
Contribution
It proposes a new diffusion model with uniform noise for universal quantization, linking likelihood modeling to compression, and demonstrates its effectiveness in progressive image coding.
Findings
Achieves competitive rate-distortion results across various bit-rates.
Enables incremental transmission and decoding with progressively better quality.
Brings neural codecs closer to practical deployment.
Abstract
Diffusion probabilistic models have achieved mainstream success in many generative modeling tasks, from image generation to inverse problem solving. A distinct feature of these models is that they correspond to deep hierarchical latent variable models optimizing a variational evidence lower bound (ELBO) on the data likelihood. Drawing on a basic connection between likelihood modeling and compression, we explore the potential of diffusion models for progressive coding, resulting in a sequence of bits that can be incrementally transmitted and decoded with progressively improving reconstruction quality. Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process, whose negative ELBO corresponds to the end-to-end compression cost using universal quantization. We obtain promising first…
Peer Reviews
Decision·ICLR 2025 Oral
The primary strength of the paper is recognizing that diffusion models can be reformulated to use uniform noise, that this allows them to use universal quantization (UQ), and that UQ is an efficient solution to the REC problem for a uniform noise channel. The authors then derive and implement this model, which they call a "universally quantized diffusion model" (UQDM), and evaluate it empirically to show that the theoretical benefits translate to practice as shown by the RD curve comparisons in
Although there are many benefits to the proposed approach, it is not strictly better than previous methods at all bit rates and metrics (PSNR and FID are reported). The comparison is complicated somewhat due to a difference in capabilities, e.g., CDC has better FID than UQDM at low bit rates, but UQDM is progressive while CDC is not. Similarly, UQDM is fairly far behind the (theoretical) quality of VDM, but UQDM provides significant real-world benefits in terms of computational requirements. In
- The paper essentially introduces a formulation of diffusion models using uniform (as opposed to Gaussian) noise, effectively turning diffusion models into a method that through end-to-end optimization directly approximates the quantile function of the data distribution, which can be broadly useful beyond the context of compression. - The mathematical background and literature review are well-presented. - Even if encoding and decoding uses significantly more FLOPs compared to common non-neural
The biggest weakness seems to be a lack of comparisons against other diffusion-based lossy compression methods from prior work on the image compression tasks. E.g., DDPM is open sourced so this should be trivial to try. I acknowledge that at least two neural codec baselines (CTC, CDC) are included so this is not a fatal weakness. Still, positive results here would further motivate why we would want to follow the proposed method as opposed to prior ones. Particularly at low bpp rates, bpg which
- This paper presents an interesting idea with relevant contributions for the community. The focus on improving compression with general-purpose diffusion models is exciting and relevant. - The theory developed around uniform-noise diffusion, along with its connection to standard Gaussian diffusion in the continuous-time limit, is well-articulated and adds depth to the work. - The results seem quite promising.
- **Clarity Issues in Background**: The paper could be clearer, especially in the background section. It would help if it were more accessible to readers who aren't already familiar with probabilistic generative models for compression. For example: - It would be useful to give a more intuitive explanation of transmitting a sample from $q$ with close to KL nats, as this is central to the argument about the NELBO corresponding to the lossless coding cost. - Some wording is a bit vague, like "u
Videos
Taxonomy
TopicsAdvanced Data Compression Techniques
MethodsDiffusion
