Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu; Gongye Liu; Jiajun Liang; Yangguang Li; Jiaheng Liu; Xintao Wang; Pengfei Wan; Di Zhang; Wanli Ouyang

arXiv:2505.05470·cs.CV·October 28, 2025

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang

PDF

Open Access 1 Repo 4 Models 1 Video

TL;DR

Flow-GRPO introduces a novel method combining online reinforcement learning with flow matching models, improving sampling efficiency and generation accuracy in text-to-image tasks through innovative ODE-to-SDE conversion and denoising strategies.

Contribution

It is the first to integrate online policy gradient RL into flow matching models, enhancing sampling efficiency and generation quality in text-to-image applications.

Findings

01

Significant accuracy improvements in compositional generation and visual text rendering.

02

Enhanced human preference alignment with minimal reward hacking.

03

Effective application across multiple text-to-image tasks.

Abstract

We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistical sampling for RL exploration; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original number of inference steps, significantly improving sampling efficiency without sacrificing performance. Empirically, Flow-GRPO is effective across multiple text-to-image tasks. For compositional generation, RL-tuned SD3.5-M generates nearly perfect object counts, spatial relations, and fine-grained attributes, increasing GenEval accuracy from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yifan123/flow_grpo
pytorchOfficial

Models

Videos

Flow-GRPO: Training Flow Matching Models via Online RL· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games