V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

Bingda Tang; Yuhui Zhang; Xiaohan Wang; Jiayuan Mao; Ludwig Schmidt; Serena Yeung-Levy

arXiv:2604.23380·cs.LG·April 28, 2026

V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

Bingda Tang, Yuhui Zhang, Xiaohan Wang, Jiayuan Mao, Ludwig Schmidt, Serena Yeung-Levy

PDF

1 Repo

TL;DR

V-GRPO introduces a stable, efficient ELBO-based reinforcement learning method for denoising generative models, significantly improving text-to-image synthesis performance and speed.

Contribution

The paper demonstrates that ELBO-based RL can be both stable and efficient, surpassing MDP-based methods in denoising generative models.

Findings

01

V-GRPO achieves state-of-the-art results in text-to-image synthesis.

02

It delivers a 2x speedup over MixGRPO.

03

It delivers a 3x speedup over DiffusionNFT.

Abstract

Aligning denoising generative models with human preferences or verifiable rewards remains a key challenge. While policy-gradient online reinforcement learning (RL) offers a principled post-training framework, its direct application is hindered by the intractable likelihoods of these models. Prior work therefore either optimizes an induced Markov decision process (MDP) over sampling trajectories, which is stable but inefficient, or uses likelihood surrogates based on the diffusion evidence lower bound (ELBO), which have so far underperformed on visual generation. Our key insight is that the ELBO-based approach can, in fact, be made both stable and efficient. By reducing surrogate variance and controlling gradient steps, we show that this approach can beat MDP-based methods. To this end, we introduce Variational GRPO (V-GRPO), a method that integrates ELBO-based surrogates with the Group…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tang-bd/v-grpo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.