TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
Yihong Luo, Tianyang Hu, Weijian Luo, Jing Tang

TL;DR
TDM-R1 introduces a novel reinforcement learning framework that effectively incorporates non-differentiable rewards into few-step diffusion models, significantly enhancing their generative quality and alignment capabilities.
Contribution
It proposes a decoupled RL paradigm with practical methods for reward signal extraction, enabling the use of non-differentiable rewards in few-step generative models.
Findings
Achieves state-of-the-art RL performance on in-domain and out-of-domain metrics.
Outperforms existing models with only 4 NFEs on Z-Image.
Improves text-to-image quality and preference alignment.
Abstract
While few-step generative models have enabled powerful image and video generation at significantly lower cost, generic reinforcement learning (RL) paradigms for few-step models remain an unsolved problem. Existing RL approaches for few-step diffusion models strongly rely on back-propagating through differentiable reward models, thereby excluding the majority of important real-world reward signals, e.g., non-differentiable rewards such as humans' binary likeness, object counts, etc. To properly incorporate non-differentiable rewards to improve few-step generative models, we introduce TDM-R1, a novel reinforcement learning paradigm built upon a leading few-step model, Trajectory Distribution Matching (TDM). TDM-R1 decouples the learning process into surrogate reward learning and generator learning. Furthermore, we developed practical methods to obtain per-step reward signals along the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics · Face recognition and analysis
