Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
Hirohane Takagi, Atsushi Nitanda

TL;DR
This paper presents a zeroth-order proximal sampler using alternating diffusion that directly simulates heat flow dynamics, enabling efficient sampling without auxiliary models or rejection steps, and demonstrates rapid convergence in practice.
Contribution
It introduces a novel approximate proximal sampling method that operates solely with zeroth-order information and directly simulates heat flow dynamics, avoiding rejection sampling.
Findings
The method converges rapidly to the target distribution.
It treats intermediate distributions as Gaussian mixtures for score estimation.
The approach allows flexible step sizes and deterministic runtime.
Abstract
This work introduces a new approximate proximal sampler that operates solely with zeroth-order information of the potential function. Prior theoretical analyses have revealed that proximal sampling corresponds to alternating forward and backward iterations of the heat flow. The backward step was originally implemented by rejection sampling, whereas we directly simulate the dynamics. Unlike diffusion-based sampling methods that estimate scores via learned models or by invoking auxiliary samplers, our method treats the intermediate particle distribution as a Gaussian mixture, thereby yielding a Monte Carlo score estimator from directly samplable distributions. Theoretically, when the score estimation error is sufficiently controlled, our method inherits the exponential convergence of proximal sampling under isoperimetric conditions on the target distribution. In practice, the algorithm…
Peer Reviews
Decision·ICLR 2026 Poster
Unlike the proximal sampling based on rejection sampling, this paper approximately implements RGO via simulation backward SDE and score function using a particle system (i.e., Monte Carlo simulation). The advantage (as reflected in the experiments) is that the stepsize $h$ of the proximal sampling could be taken large (as long as $T={\cal O}(h)$). In particular, $h$ does not depend on dimension $d$ and properties of $f$ such as smoothness $L$. Moreover, the method does not require the first-orde
A major shortcoming is that the technical challenge addressed by the work is not very clear. Simulating SDEs with particle systems is a well-studied idea that has appeared frequently in the literature, so the methodological novelty seems limited, in view of the comparison with related works in Section 6. Moreover, the technical depth of the analysis is uncertain, as many of the proof techniques appear to be adapted from existing works, such as Vempala and Wibisono (2019). Another notable shortc
The method provides a means for implementing the RGO in the proximal sampler using only \emph{zeroth} order queries, which is a more general computational model than the standard gradient oracle model. Furthermore, it does not need any convoluted tricks (MALA + underdamped) to implement the proximal sampler, compared to prior theoretical proposals. The method appears to work extremely well in practice when compared to the standard implementation of the RGO. Generally, I suppose it is not too su
The theoretical guarantees are not particularly strong; I would be surprised if in the LSI setting this could improve upon the guarantees for the usual implementation of the RGO. Indeed, as the error scales as 1/N in the number of particles, so we should expect $N \asymp \varepsilon^{-2}$ or polynomial in the accuracy (compared to the standard implementation). Of course, there are other drawbacks to the theory.
1. Although many components (proximal samplers, reverse diffusion Monte Carlo, and SMC) are drawn from prior work, the paper integrates them coherently to deliver a zero-order sampling scheme with provable convergence. The design is guided by a clear insight: Langevin-based methods inherently require gradients, whereas reverse-diffusion samplers can operate using only zero-order information. 2. The paper explicitly elucidates the close connection between proximal sampling and reverse diffusion,
1. The complexity analysis appears incomplete. The paper provides a one-step KL contraction bound with an additive term, but lacks an end-to-end error and cost analysis that composes these bounds over the full trajectory. A global, non-asymptotic complexity guarantee would substantially strengthen the contribution. 2. The theoretical results are largely asymptotic. In practice, performance with finite $M$ (particles) and $N$ (time steps) is crucial. The assumptions used to control the one-step e
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks
