Diffusion Alignment Beyond KL: Variance Minimisation as Effective Policy Optimiser
Zijing Ou, Jacob Si, Junyi Zhu, Ondrej Bohdal, Mete Ozay, Taha Ceritli, Yingzhen Li

TL;DR
This paper introduces Variance Minimisation Policy Optimisation (VMPO), a novel approach for diffusion alignment that minimizes variance of importance weights, offering a new perspective beyond traditional KL-based objectives and unifying existing methods.
Contribution
The paper proposes VMPO, a new variance-based objective for diffusion alignment, providing theoretical insights and a unified framework that extends beyond KL-based methods.
Findings
Variance minimisation aligns with reward-tilted distributions.
Gradient of variance minimisation matches KL alignment under on-policy sampling.
VMPO generalizes and suggests new diffusion alignment strategies.
Abstract
Diffusion alignment adapts pretrained diffusion models to sample from reward-tilted distributions along the denoising trajectory. This process naturally admits a Sequential Monte Carlo (SMC) interpretation, where the denoising model acts as a proposal and reward guidance induces importance weights. Motivated by this view, we introduce Variance Minimisation Policy Optimisation (VMPO), which formulates diffusion alignment as minimising the variance of log importance weights rather than directly optimising a Kullback-Leibler (KL) based objective. We prove that the variance objective is minimised by the reward-tilted target distribution and that, under on-policy sampling, its gradient coincides with that of standard KL-based alignment. This perspective offers a common lens for understanding diffusion alignment. Under different choices of potential functions and variance minimisation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
