Aligning Few-Step Diffusion Models with Dense Reward Difference Learning

Ziyi Zhang; Li Shen; Sen Zhang; Deheng Ye; Yong Luo; Miaojing Shi; Dongjing Shan; Bo Du; Dacheng Tao

arXiv:2411.11727·cs.LG·March 2, 2026

Aligning Few-Step Diffusion Models with Dense Reward Difference Learning

Ziyi Zhang, Li Shen, Sen Zhang, Deheng Ye, Yong Luo, Miaojing Shi, Dongjing Shan, Bo Du, Dacheng Tao

PDF

1 Repo

TL;DR

This paper introduces SDPO, a reinforcement learning framework designed to improve the alignment of few-step diffusion models with specific objectives by using dense reward signals and novel optimization strategies.

Contribution

We propose a new RL framework, SDPO, that enhances few-step diffusion models with dense rewards, dual-state sampling, and stability techniques for better objective alignment.

Findings

01

SDPO outperforms existing methods in reward alignment across multiple tasks.

02

Dense reward strategies improve sample efficiency and policy updates.

03

Additional refinements enhance stability and long-term dependency handling.

Abstract

Few-step diffusion models enable efficient high-resolution image synthesis but struggle to align with specific downstream objectives due to limitations of existing reinforcement learning (RL) methods in low-step regimes with limited state spaces and suboptimal sample quality. To address this, we propose Stepwise Diffusion Policy Optimization (SDPO), a novel RL framework tailored for few-step diffusion models. SDPO introduces a dual-state trajectory sampling mechanism, tracking both noisy and predicted clean states at each step to provide dense reward feedback and enable low-variance, mixed-step optimization. For further efficiency, we develop a latent similarity-based dense reward prediction strategy to minimize costly dense reward queries. Leveraging these dense rewards, SDPO optimizes a dense reward difference learning objective that enables more frequent and granular policy updates.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziyizhang27/sdpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion