Policy Representation via Diffusion Probability Model for Reinforcement Learning
Long Yang, Zhixiong Huang, Fenghao Lei, Yucun Zhong, Yiming Yang, Cong, Fang, Shiting Wen, Binbin Zhou, Zhouchen Lin

TL;DR
This paper introduces a novel diffusion-based policy representation for reinforcement learning, enabling multimodal policies and improving exploration, with a new algorithm DIPO demonstrating superior performance on continuous control tasks.
Contribution
It provides a theoretical foundation for diffusion policy in RL, proposes the DIPO algorithm, and demonstrates its effectiveness on benchmark tasks.
Findings
DIPO outperforms existing methods on Mujoco benchmarks.
Diffusion policy offers a multimodal and expressive policy representation.
Theoretical convergence guarantees support the method's validity.
Abstract
Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to learn complicated multimodal distributions, which has shown promising and potential applications to RL. In this paper, we formally build a theoretical foundation of policy representation via the diffusion probability model and provide practical implementations of diffusion policy for online model-free RL. Concretely, we character diffusion policy as a stochastic process, which is a new approach to representing a policy. Then we present a convergence guarantee for diffusion policy, which provides a theory to understand the multimodality of diffusion policy. Furthermore, we propose the DIPO which is an implementation for model-free online RL with DIffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion
