dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
Zhengyan Wan, Yidong Ouyang, Panwen Hu, Qiang Sun

TL;DR
dFlowGRPO is a reinforcement learning framework for discrete flow models that improves image generation and understanding by leveraging trajectory probabilities and transition rates.
Contribution
It introduces a unified RL approach supporting various probability paths and source distributions for discrete flow models, with applications to multimodal tasks.
Findings
Outperforms existing GRPO methods on text-to-image generation.
Achieves performance competitive with continuous flow models.
Demonstrates strong understanding capabilities.
Abstract
Discrete flow models (DFMs) are a class of flexible generative models for generating discrete data, and diffusion large language models (dLLMs) can be viewed as a special case with a specific choice of mixture path and a masked source distribution. While several recent works have explored reinforcement learning into dLLMs, its application to more general discrete flow models remains underexplored. In this work, we present discrete Flow-GRPO (dFlowGRPO), a unified reinforcement learning framework for discrete flow models that supports a broad family of probability paths and non-masked source distributions. We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model during reinforcement learning. We apply dFlowGRPO to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
