Abstract
Given an unnormalized probability density , estimating its normalizing constant or free energy is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of for estimating within relative error with high probability,…
Peer Reviews
Decision·ICLR 2026 Poster
I think this is an interesting and useful result, providing a principled means of normalizing constant estimation with interpretable guarantees. I concur with the authors that this is an important problem in computational statistics. The connections between this and annealed sampling algorithms are elegant and nicely parallel the connection in discrete settings. This fact is also noticed by the authors. The connections to statistical mechanics are always nice to see.
I find the action of a curve to be a somewhat inscrutable quantity from the perspective of algorithm design. The authors do a good job of trying to make this quantity accessible to unfamiliar readers but it certainly merits further investigation.
This is a very timely topic given the increasing presence of these algorithmic methods (especially Jarzynski's equality which has been adapted for sampling with flow-based neural network models) and the limited theory for them. Normalizing constant estimation is a fundamental problem in ML and statistics, and the authors do a good job of giving examples of applications. The paper is technically well-written, and I especially appreciate the careful treatment of the forward/backwards SDE's. I expe
The proof is written just for geometric interpolation, though it would be better to have a general bound that works for an arbitrary annealing sequence and added drift, as this flexibility is often important in the literature, especially in approaches that depend on learning the annealing sequence and drift (e.g. Máté and Fleuret, 2023; Albergo and Vanden-Eijnden, 2025). Would a similar proof work, or is the proof specialized to the annealing sequence? What changes are necessary? Relatedly, woul
I believe the main contribution of the paper, according to itself, is the non-asymptotic analysis (in the number of path samples) for TI and AIS (LMC) which is presented in THM2 and 4. THM2 proves constant probability bound on the error of the normalization factor estimation ratio in the path integral of Thermodynamic integration. THM4 proves an upper bound on the number of oracle calls $M$ in AIS (LMC, Alg. 1).
1 - Some important recent literature is missing from the work. How do you compare your method to Adjoint sampler or methods approximating the Kantorovich potential of the RDS. 2 - Organization of the paper is very synthetic. The paper is super technical, lacks a properly structured background section and multiple theoretical results are presented without a straight forward relevance to each other. The notations are not clearly defined in the text ($B_t$ and $B_t^\leftarrow$ in Eq 2 and 4, $n$ i
Videos
Taxonomy
TopicsAdvanced Statistical Methods and Models
MethodsDiffusion
