GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
Shutong Ding, Ke Hu, Shan Zhong, Haoyang Luo, Weinan Zhang, Jingya Wang, Jun Wang, Ye Shi

TL;DR
GenPO introduces a novel framework that integrates generative diffusion policies into on-policy reinforcement learning, enabling efficient training of complex robotic tasks with improved exploration and policy expressiveness.
Contribution
It proposes a new invertible diffusion policy method with a dummy action mechanism, allowing diffusion models to be used effectively in on-policy RL frameworks like PPO.
Findings
GenPO outperforms existing RL baselines on eight IsaacLab benchmarks.
It is the first to successfully incorporate diffusion policies into on-policy RL.
Demonstrates improved exploration and policy expressiveness in robotic tasks.
Abstract
Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Locomotion and Control
MethodsEntropy Regularization · Diffusion · Proximal Policy Optimization
