TL;DR
This paper introduces a novel method for approximate posterior sampling in score-based generative models, achieving theoretical guarantees of closeness to the true posterior in KL and Fisher divergences, with practical implications for image tasks.
Contribution
It provides the first formal polynomial-time algorithm for approximate posterior sampling that balances measurement consistency and prior fidelity.
Findings
Samples are close to the true posterior in KL divergence.
Samples are close to the true posterior in Fisher divergence.
Method is empirically successful in image applications.
Abstract
We study the problem of posterior sampling in the context of score based generative models. We have a trained score network for a prior , a measurement model , and are tasked with sampling from the posterior . Prior work has shown this to be intractable in KL (in the worst case) under well-accepted computational hardness assumptions. Despite this, popular algorithms for tasks such as image super-resolution, stylization, and reconstruction enjoy empirical success. Rather than establishing distributional assumptions or restricted settings under which exact posterior sampling is tractable, we view this as a more general "tilting" problem of biasing a distribution towards a measurement. Under minimal assumptions, we show that one can tractably sample from a distribution that is simultaneously close to the posterior of a noised prior in KL divergence and the true…
Peer Reviews
Decision·ICLR 2026 Poster
* The presentation is clear and the overview of the background is well-done. They do a good job of stating clearly the main contribution of the paper (dual guarantees of FI and KL) * The approach is novel. It seems more normal to start with the prior and incorporate likelihood information (as with Bayesian updating), or to incorporate prior and likelihood information simultaneously (as with standard classifier-free guidance). But incorporating the likelihood information first and then incorpora
* Lack of empirical validiation. Specifically, they have a guarantee with respect to a noised version of the prior, but it's unclear how significant that noising is for degrading the quality of the sample (e.g. the perceptual quality in image diffusion models) * Related to the above, but while there are polynomial guarantees in d and 1/\epsilon, the possiblity of large hidden constants could be relevant to applications.
1. The authors provide a detailed and mathematically rigorous analysis for approximate posterior sampling using a pre-trained score model. 2. The paper does a good job of positioning itself relative to recent negative results (Gupta et al., 2024) and explaining the limitations of existing approaches. 3. The problem of developing provable methods for posterior sampling with diffusion models is of high importance to the community.
1. The paper makes no attempt to bridge the gap between its theoretical findings and practical applications. There are no experiments to illustrate the behavior of the algorithm or the meaning of the theoretical bounds. The claim that the method solves a practical problem is therefore not empirically supported. 2. The practical relevance of the results is limited by strong assumptions. For example, the requirement that the log-likelihood `R(x)` be convex is a major restriction that excludes ma
This is a well-written and technically sound paper with clear motivation, strong theoretical grounding, and solid empirical evidence. The methodology is both elegant and practical, providing a meaningful contribution to improving inference efficiency in diffusion models. The paper is well-organized and easy to follow, with a consistent logical flow from theoretical derivation to experimental validation.
I have only two minor concerns that do not undermine the overall quality of the paper. 1. At line 159, the authors claim that two processes have the same joint distribution. However, this statement may not be strictly accurate: one process is a Markov process while the other is an 'inverse' Markov process, and although they share the same marginal distribution at each fixed time $t$, their joint distributions are not identical. This point should be clarified for mathematical precision. 2. At
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
