Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
Zhendong Wang, Jonathan J Hunt, Mingyuan Zhou

TL;DR
This paper introduces Diffusion-QL, a novel offline RL method that employs diffusion models for highly expressive policy representation, leading to superior performance on benchmark tasks by effectively balancing behavior cloning and policy improvement.
Contribution
The paper proposes using diffusion models as policy representations in offline RL, enabling more expressive policies and improved performance over existing methods.
Findings
Diffusion-QL outperforms prior methods on D4RL benchmarks.
The diffusion model-based policy captures complex, multimodal action distributions.
The approach effectively balances behavior cloning and policy improvement.
Abstract
Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by policy classes with limited expressiveness that can lead to highly suboptimal solutions. In this paper, we propose representing the policy as a diffusion model, a recent class of highly-expressive deep generative models. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsQ-Learning · Diffusion
