TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward

Yihong Luo; Tianyang Hu; Weijian Luo; Jing Tang

arXiv:2603.07700·cs.CV·March 10, 2026

TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward

Yihong Luo, Tianyang Hu, Weijian Luo, Jing Tang

PDF

Open Access 1 Models

TL;DR

TDM-R1 introduces a novel reinforcement learning framework that effectively incorporates non-differentiable rewards into few-step diffusion models, significantly enhancing their generative quality and alignment capabilities.

Contribution

It proposes a decoupled RL paradigm with practical methods for reward signal extraction, enabling the use of non-differentiable rewards in few-step generative models.

Findings

01

Achieves state-of-the-art RL performance on in-domain and out-of-domain metrics.

02

Outperforms existing models with only 4 NFEs on Z-Image.

03

Improves text-to-image quality and preference alignment.

Abstract

While few-step generative models have enabled powerful image and video generation at significantly lower cost, generic reinforcement learning (RL) paradigms for few-step models remain an unsolved problem. Existing RL approaches for few-step diffusion models strongly rely on back-propagating through differentiable reward models, thereby excluding the majority of important real-world reward signals, e.g., non-differentiable rewards such as humans' binary likeness, object counts, etc. To properly incorporate non-differentiable rewards to improve few-step generative models, we introduce TDM-R1, a novel reinforcement learning paradigm built upon a leading few-step model, Trajectory Distribution Matching (TDM). TDM-R1 decouples the learning process into surrogate reward learning and generator learning. Furthermore, we developed practical methods to obtain per-step reward signals along the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Luo-Yihong/TDM-R1
model· ♡ 9
♡ 9

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics · Face recognition and analysis