Diffusion Policies as an Expressive Policy Class for Offline   Reinforcement Learning

Zhendong Wang; Jonathan J Hunt; Mingyuan Zhou

arXiv:2208.06193·cs.LG·August 29, 2023·33 cites

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Zhendong Wang, Jonathan J Hunt, Mingyuan Zhou

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces Diffusion-QL, a novel offline RL method that employs diffusion models for highly expressive policy representation, leading to superior performance on benchmark tasks by effectively balancing behavior cloning and policy improvement.

Contribution

The paper proposes using diffusion models as policy representations in offline RL, enabling more expressive policies and improved performance over existing methods.

Findings

01

Diffusion-QL outperforms prior methods on D4RL benchmarks.

02

The diffusion model-based policy captures complex, multimodal action distributions.

03

The approach effectively balances behavior cloning and policy improvement.

Abstract

Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by policy classes with limited expressiveness that can lead to highly suboptimal solutions. In this paper, we propose representing the policy as a diffusion model, a recent class of highly-expressive deep generative models. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsQ-Learning · Diffusion