Training-Free Rate-Distortion-Perception Traversal With Diffusion
Yuhan Wang, Suzhi Bi, and Ying-Jun Angela Zhang

TL;DR
This paper introduces a training-free method using pre-trained diffusion models to navigate the entire rate-distortion-perception tradeoff in lossy compression, combining theoretical optimality with empirical flexibility.
Contribution
It presents a novel, training-free framework that leverages diffusion models and a reverse channel coding module to adaptively traverse the RDP surface without retraining.
Findings
The diffusion decoder is proven optimal for the distortion-perception tradeoff under AWGN.
The framework achieves the optimal RDP function in the Gaussian case.
Empirical results show effective navigation of the RDP tradeoff across datasets.
Abstract
The rate-distortion-perception (RDP) tradeoff characterizes the fundamental limits of lossy compression by jointly considering bitrate, reconstruction fidelity, and perceptual quality. While recent neural compression methods have improved perceptual performance, they typically operate at fixed points on the RDP surface, requiring retraining to target different tradeoffs. In this work, we propose a training-free framework that leverages pre-trained diffusion models to traverse the entire RDP surface. Our approach integrates a reverse channel coding (RCC) module with a novel score-scaled probability flow ODE decoder. We theoretically prove that the proposed diffusion decoder is optimal for the distortion-perception tradeoff under AWGN observations and that the overall framework with the RCC module achieves the optimal RDP function in the Gaussian case. Empirical results across multiple…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper studies an interesting problem of rate-distortion-perception tradeoff.
I believe that the paper has several major technical issues, and its contributions are not sufficiently compelling when compared to prior work such as Blau & Michaeli, Salehkalaibar et al., and Zhang et al. 1) The theoretical results lack sufficient rigor. For example, in the proof of Theorem 4, a reverse channel coding encoder is paired with a score-based decoder, where the central challenge is to establish a one-shot result for this encoder-decoder combination. However, the proof simply assum
- The paper motivation of traversing the RDP tradeoff[1] is clear and, in my opinion, important. - This paper continues the recent trend (DiffC[2], PSC[3], DDCM[4]) of using a single, pre-trained model to support multiple bitrates in a flexible manner - which is also very important in this reviewer's opinion. Other methods such as HiFiC that need differently trained models for every new bitrate make neural compression less practical in the long run for edge devices. - The idea of basically combi
- The key concept - traversing the RDP curve by using a combined solution of the MMSE (distortion-optimal) and perfect-perception reconstructions - is the main idea in [5]. That work formalized the interpolation mechanism and proved its optimality. While here the combination is of the score (and not the estimator), this manuscript does not acknowledge them at all. Specifically, as the optimality in this paper is concerned with the scalar Gaussian case - are their results (and therefore the optim
Training-Free and Generalizable: It directly leverages pre-trained diffusion models without requiring retraining for different rates or trade-offs. A single model, controlled by the two knobs t and ρ, covers a wide R-D-P region, enabling a perception-distortion trade-off. Rigorous Theoretical Proofs: The paper provides extensive theoretical proofs, significantly enhancing its readability and credibility.
1. Perceived Lack of Innovation: Overall, the main innovation arguably lies in introducing the parameter ρ on top of the DiffC[1] framework to control the perception-distortion trade-off. While it successfully achieves this trade-off, the core concept may not be considered highly novel, as it does not address the fundamental limitations of this class of methods. 2. High Computational Complexity and Slow Inference: Similar to DiffC[1], this method inherently suffers from significant encoding and
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Video Quality Assessment · Video Coding and Compression Technologies
