UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Jiaqi Wang; Haoge Deng; Ting Pan; Yang Liu; Chengyuan Wang; Fan Zhang; Yonggang Qi; Xinlong Wang

arXiv:2604.18518·cs.CV·April 22, 2026

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Jiaqi Wang, Haoge Deng, Ting Pan, Yang Liu, Chengyuan Wang, Fan Zhang, Yonggang Qi, Xinlong Wang

PDF

1 Repo 2 Models

TL;DR

This paper introduces UDM-GRPO, a novel framework that effectively integrates Uniform Discrete Diffusion Models with reinforcement learning, leading to significant performance improvements in various tasks.

Contribution

The paper presents the first stable and efficient method for combining UDM with RL, including new strategies for training and trajectory reconstruction.

Findings

01

GenEval accuracy improves from 69% to 96%.

02

PickScore increases from 20.46 to 23.81.

03

OCR benchmark accuracy rises from 8% to 57%.

Abstract

Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains largely unexplored. We observe that naively applying GRPO to UDM leads to training instability and marginal performance gains. To address this, we propose UDM-GRPO, the first framework to integrate UDM with RL. Our method is guided by two key insights: (i) treating the final clean sample as the action provides more accurate and stable optimization signals; and (ii) reconstructing trajectories via the diffusion forward process better aligns probability paths with the pretraining distribution. Additionally, we introduce two strategies, Reduced-Step and CFG-Free, to further improve training efficiency. UDM-GRPO significantly improves base model performance across multiple T2I tasks. Notably, GenEval accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yovecent/UDM-GRPO
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.