TL;DR
This paper introduces UDM-GRPO, a novel framework that effectively integrates Uniform Discrete Diffusion Models with reinforcement learning, leading to significant performance improvements in various tasks.
Contribution
The paper presents the first stable and efficient method for combining UDM with RL, including new strategies for training and trajectory reconstruction.
Findings
GenEval accuracy improves from 69% to 96%.
PickScore increases from 20.46 to 23.81.
OCR benchmark accuracy rises from 8% to 57%.
Abstract
Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance gains. To address this, we propose UDM-GRPO, the first framework to integrate UDM with RL. Our method is guided by two key insights: (i) treating the final clean sample as the action provides more accurate and stable optimization signals; and (ii) reconstructing trajectories via the diffusion forward process better aligns probability paths with the pretraining distribution. Additionally, we introduce two strategies, Reduced-Step and CFG-Free, to further improve training efficiency. UDM-GRPO significantly improves base model performance across multiple T2I tasks. Notably, GenEval accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
