dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

Zhengyan Wan; Yidong Ouyang; Panwen Hu; Qiang Sun

arXiv:2605.09291·cs.LG·May 12, 2026

dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

Zhengyan Wan, Yidong Ouyang, Panwen Hu, Qiang Sun

PDF

TL;DR

dFlowGRPO is a reinforcement learning framework for discrete flow models that improves image generation and understanding by leveraging trajectory probabilities and transition rates.

Contribution

It introduces a unified RL approach supporting various probability paths and source distributions for discrete flow models, with applications to multimodal tasks.

Findings

01

Outperforms existing GRPO methods on text-to-image generation.

02

Achieves performance competitive with continuous flow models.

03

Demonstrates strong understanding capabilities.

Abstract

Discrete flow models (DFMs) are a class of flexible generative models for generating discrete data, and diffusion large language models (dLLMs) can be viewed as a special case with a specific choice of mixture path and a masked source distribution. While several recent works have explored reinforcement learning into dLLMs, its application to more general discrete flow models remains underexplored. In this work, we present discrete Flow-GRPO (dFlowGRPO), a unified reinforcement learning framework for discrete flow models that supports a broad family of probability paths and non-masked source distributions. We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model during reinforcement learning. We apply dFlowGRPO to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.