Iterative Importance Fine-tuning of Diffusion Models

Alexander Denker; Shreyas Padhy; Francisco Vargas; Johannes Hertrich

arXiv:2502.04468·cs.LG·February 12, 2026

Iterative Importance Fine-tuning of Diffusion Models

Alexander Denker, Shreyas Padhy, Francisco Vargas, Johannes Hertrich

PDF

Open Access 3 Reviews

TL;DR

This paper presents a self-supervised iterative fine-tuning method for diffusion models that improves conditional sampling efficiency by learning optimal control, applicable to various tasks like class-conditional sampling and inverse problems.

Contribution

Introduces a novel self-supervised iterative fine-tuning algorithm for diffusion models that learns optimal control for improved posterior sampling.

Findings

01

Effective in class-conditional sampling tasks

02

Improves inverse problem solving with diffusion models

03

Enhances reward fine-tuning for text-to-image models

Abstract

Diffusion models are an important tool for generative modelling, serving as effective priors in applications such as imaging and protein design. A key challenge in applying diffusion models for downstream tasks is efficiently sampling from resulting posterior distributions, which can be addressed using Doob's $h$ -transform. This work introduces a self-supervised algorithm for fine-tuning diffusion models by learning the optimal control, enabling amortised conditional sampling. Our method iteratively refines the control using a synthetic dataset resampled with path-based importance weights. We demonstrate the effectiveness of this framework on class-conditional sampling, inverse problems and reward fine-tuning for text-to-image diffusion models.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. The main important contribution of the paper is the use of relaxed importance sampling (from (Hertrich & Gruhlke, 2025)) to bypass the requirement of a fine-tuning dataset (as in DEFT), which is a useful and important improvement over that algorithm - Theorem 6, and the empirical validation in Fig 2 is useful for demonstrating that the relaxed importance sampling method doesn’t compromise theoretical validity of the method. 2. The method has a benefit over other methods in not storing (or

Weaknesses

1) The use of the replay buffer requires further explanation or justification, since it is critical to the efficiency of the algorithm: - Why is it reasonable to use a replay buffer? Shouldn’t trajectories be drawn on-policy (according to current $h$ ) to have the correct importance weights computed according to eq. 14 (I am assuming the accepted samples are placed in the buffer, as described in Algorithm 1)? - Notably in other works (eg. in Sendera et. al., 2024), the use of a repla

Reviewer 02Rating 4Confidence 4

Strengths

Unlike prior methods such as RAFT that make use of importance sampling with the current iteration as the proposal generator, the authors derive a principled approach for fine-tuning. The authors propose a tractable accept/reject probability for use in the diffusion fine-tuning setting.

Weaknesses

1. The authors should engage with existing literature in a more thorough manner. For instance, iterative cross-entropy (De Boer et al 2005), reinforced self-training (Gulcehre et al 2023), and RAFT (Dong et al 2023), all use importance-sampling and rejection sampling for fine-tuning. 2. As baselines, the authors should include a comparison to sequential Monte Carlo methods, which in the low particle regime have been shown to outperform fine-tuned models. 3. The authors also make use of reward g

Reviewer 03Rating 2Confidence 4

Strengths

The paper in general is easy to follow. The proposed idea is simple and straightforward to implement.

Weaknesses

The main weakness of the current submission is that the efficiency and effectiveness of the proposed method remain unclear. The presentation lacks critical algorithmic details, making it difficult to understand how the method actually work. Moreover, the empirical comparisons to prior work are limited and not convincingly justified. See more details below, - Important algorithmic details are not included in the main paper. The choice of hyperparameter $c$ should be important for the efficiency

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Mathematical Modeling in Engineering

MethodsDiffusion