GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Shutong Ding; Ke Hu; Shan Zhong; Haoyang Luo; Weinan Zhang; Jingya Wang; Jun Wang; Ye Shi

arXiv:2505.18763·cs.LG·January 23, 2026

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Shutong Ding, Ke Hu, Shan Zhong, Haoyang Luo, Weinan Zhang, Jingya Wang, Jun Wang, Ye Shi

PDF

Open Access 1 Video

TL;DR

GenPO introduces a novel framework that integrates generative diffusion policies into on-policy reinforcement learning, enabling efficient training of complex robotic tasks with improved exploration and policy expressiveness.

Contribution

It proposes a new invertible diffusion policy method with a dummy action mechanism, allowing diffusion models to be used effectively in on-policy RL frameworks like PPO.

Findings

01

GenPO outperforms existing RL baselines on eight IsaacLab benchmarks.

02

It is the first to successfully incorporate diffusion policies into on-policy RL.

03

Demonstrates improved exploration and policy expressiveness in robotic tasks.

Abstract

Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Locomotion and Control

MethodsEntropy Regularization · Diffusion · Proximal Policy Optimization