Policy Representation via Diffusion Probability Model for Reinforcement   Learning

Long Yang; Zhixiong Huang; Fenghao Lei; Yucun Zhong; Yiming Yang; Cong; Fang; Shiting Wen; Binbin Zhou; Zhouchen Lin

arXiv:2305.13122·cs.LG·May 23, 2023·6 cites

Policy Representation via Diffusion Probability Model for Reinforcement Learning

Long Yang, Zhixiong Huang, Fenghao Lei, Yucun Zhong, Yiming Yang, Cong, Fang, Shiting Wen, Binbin Zhou, Zhouchen Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel diffusion-based policy representation for reinforcement learning, enabling multimodal policies and improving exploration, with a new algorithm DIPO demonstrating superior performance on continuous control tasks.

Contribution

It provides a theoretical foundation for diffusion policy in RL, proposes the DIPO algorithm, and demonstrates its effectiveness on benchmark tasks.

Findings

01

DIPO outperforms existing methods on Mujoco benchmarks.

02

Diffusion policy offers a multimodal and expressive policy representation.

03

Theoretical convergence guarantees support the method's validity.

Abstract

Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to learn complicated multimodal distributions, which has shown promising and potential applications to RL. In this paper, we formally build a theoretical foundation of policy representation via the diffusion probability model and provide practical implementations of diffusion policy for online model-free RL. Concretely, we character diffusion policy as a stochastic process, which is a new approach to representing a policy. Then we present a convergence guarantee for diffusion policy, which provides a theory to understand the multimodality of diffusion policy. Furthermore, we propose the DIPO which is an implementation for model-free online RL with DIffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bellmantimehut/dipo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion