Iterative Importance Fine-tuning of Diffusion Models
Alexander Denker, Shreyas Padhy, Francisco Vargas, Johannes Hertrich

TL;DR
This paper presents a self-supervised iterative fine-tuning method for diffusion models that improves conditional sampling efficiency by learning optimal control, applicable to various tasks like class-conditional sampling and inverse problems.
Contribution
Introduces a novel self-supervised iterative fine-tuning algorithm for diffusion models that learns optimal control for improved posterior sampling.
Findings
Effective in class-conditional sampling tasks
Improves inverse problem solving with diffusion models
Enhances reward fine-tuning for text-to-image models
Abstract
Diffusion models are an important tool for generative modelling, serving as effective priors in applications such as imaging and protein design. A key challenge in applying diffusion models for downstream tasks is efficiently sampling from resulting posterior distributions, which can be addressed using Doob's -transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by learning the optimal control, enabling amortised conditional sampling. Our method iteratively refines the control using a synthetic dataset resampled with path-based importance weights. We demonstrate the effectiveness of this framework on class-conditional sampling, inverse problems and reward fine-tuning for text-to-image diffusion models.
Peer Reviews
Decision·Submitted to ICLR 2026
1. The main important contribution of the paper is the use of relaxed importance sampling (from (Hertrich & Gruhlke, 2025)) to bypass the requirement of a fine-tuning dataset (as in DEFT), which is a useful and important improvement over that algorithm - Theorem 6, and the empirical validation in Fig 2 is useful for demonstrating that the relaxed importance sampling method doesn’t compromise theoretical validity of the method. 2. The method has a benefit over other methods in not storing (or
1) The use of the replay buffer requires further explanation or justification, since it is critical to the efficiency of the algorithm: - Why is it reasonable to use a replay buffer? Shouldn’t trajectories be drawn on-policy (according to current $h$ ) to have the correct importance weights computed according to eq. 14 (I am assuming the accepted samples are placed in the buffer, as described in Algorithm 1)? - Notably in other works (eg. in Sendera et. al., 2024), the use of a repla
Unlike prior methods such as RAFT that make use of importance sampling with the current iteration as the proposal generator, the authors derive a principled approach for fine-tuning. The authors propose a tractable accept/reject probability for use in the diffusion fine-tuning setting.
1. The authors should engage with existing literature in a more thorough manner. For instance, iterative cross-entropy (De Boer et al 2005), reinforced self-training (Gulcehre et al 2023), and RAFT (Dong et al 2023), all use importance-sampling and rejection sampling for fine-tuning. 2. As baselines, the authors should include a comparison to sequential Monte Carlo methods, which in the low particle regime have been shown to outperform fine-tuned models. 3. The authors also make use of reward g
The paper in general is easy to follow. The proposed idea is simple and straightforward to implement.
The main weakness of the current submission is that the efficiency and effectiveness of the proposed method remain unclear. The presentation lacks critical algorithmic details, making it difficult to understand how the method actually work. Moreover, the empirical comparisons to prior work are limited and not convincingly justified. See more details below, - Important algorithmic details are not included in the main paper. The choice of hyperparameter $c$ should be important for the efficiency
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Modeling in Engineering
MethodsDiffusion
