Improved sampling via learned diffusions
Lorenz Richter, Julius Berner

TL;DR
This paper unifies and generalizes diffusion-based sampling methods through a Schrödinger bridge framework, introducing a new divergence-based loss that improves performance and addresses mode collapse.
Contribution
It presents a generalized variational framework for diffusion sampling, incorporating new divergence measures and a novel log-variance loss for better results.
Findings
The new framework unifies existing diffusion sampling methods.
The log-variance loss improves numerical stability and sampling quality.
Enhanced performance across multiple diffusion-based sampling approaches.
Abstract
Recently, a series of papers proposed deep learning-based approaches to sample from target distributions using controlled diffusion processes, being trained only on the unnormalized target densities without access to samples. Building on previous work, we identify these approaches as special cases of a generalized Schr\"odinger bridge problem, seeking a stochastic evolution between a given prior distribution and the specified target. We further generalize this framework by introducing a variational formulation based on divergences between path space measures of time-reversed diffusion processes. This abstract perspective leads to practical losses that can be optimized by gradient-based algorithms and includes previous objectives as special cases. At the same time, it allows us to consider divergences other than the reverse Kullback-Leibler divergence that is known to suffer from mode…
Peer Reviews
Decision·ICLR 2024 poster
In my understanding, the paper's contributions are clear, and I also consider that the results are essential for several reasons: 1. The paper proposes a novel framework that provides better understanding of previous literature on sampling problems using SDEs. 2. The paper well motivates the log-variance divergence-based methods so that readers can understand how each step contributes to the merits of the proposed method. 3. I found that the paper has a well-organized structure that makes it cl
In general, I find that the paper is well-written. However, descriptions of some derivations can be improved for clarification. For example, to introduce the derivation of Equation (10), the paper uses “defined in (2) with $u$ replaced by $r \in \mathcal{U}$”. I find this description a little confusing, as Equation 10 assumes three SDEs. Clarifying such descriptions would be helpful to potential readers who are not familiar with SDEs and relevant backgrounds.
- There is an extensive theoretical discussion of the provided method and its analytical properties. - The authors provide a useful guide to organize recent works on diffusion-based approaches to density-based sampling. - The work appears to be mathematically sophisticated and relatively rigorous.
- Limited validation. The generalizations proposed are compared on rather simplistic examples. Though the authors appear to target scenarios where data is not available, data-based modeling is clearly an important application of a diffusion-based sampler. Is there a reason the models are not compared to other non-diffusion methods, e.g., MCMC, normalizing flow, autoregressive, or GAN models? - Unclear abstract: First, the abstract appears to claim that the authors "identify [diffusion models] a
The paper proposes a theoretical framework that encompasses several recently proposed methods for sampling via diffusions. This is a very interesting contribution, in particular from a theoretical point of view, since it allows a unified conceptualization of these methods. The proposed divergence seems to outperform the KL divergence, making it also an interesting contribution to the community.
[Update on Nov 22: authors have answered in details the various points raised below.] I found the paper quite dense, although I acknowledge that this is partly due to the nature of the contribution which is to propose a unifying framework. In particular, Section 3 assumes precise knowledge of the various samplers recently proposed in the literature. While it is not a problem in itself, I would suggest in case of lack of space to move part of the derivations of this Section to the Appendix. **M
Code & Models
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Model Reduction and Neural Networks · Statistical Methods and Inference
MethodsDiffusion
