Step-level Denoising-time Diffusion Alignment with Multiple Objectives
Qi Zhang, Dawei Wang, Shaofeng Zou

TL;DR
This paper introduces MSDDA, a retraining-free, multi-objective diffusion model alignment method that directly computes optimal denoising distributions, outperforming existing approaches.
Contribution
The paper proposes a novel step-level RL formulation and MSDDA framework that align diffusion models with multiple objectives without retraining or approximation errors.
Findings
MSDDA outperforms existing denoising-time approaches in numerical evaluations.
The denoising-time objective is proven to be equivalent to step-level RL fine-tuning.
The method computes optimal denoising distributions in closed form.
Abstract
Reinforcement learning (RL) has emerged as a powerful tool for aligning diffusion models with human preferences, typically by optimizing a single reward function under a KL regularization constraint. In practice, however, human preferences are inherently pluralistic, and aligned models must balance multiple downstream objectives, such as aesthetic quality and text-image consistency. Existing multi-objective approaches either rely on costly multi-objective RL fine-tuning or on fusing separately aligned models at denoising time, but they generally require access to reward values (or their gradients) and/or introduce approximation error in the resulting denoising objectives. In this paper, we revisit the problem of RL fine-tuning for diffusion models and address the intractability of identifying the optimal policy by introducing a step-level RL formulation. Building on this, we further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
