Diffusion Policies creating a Trust Region for Offline Reinforcement   Learning

Tianyu Chen; Zhendong Wang; Mingyuan Zhou

arXiv:2405.19690·cs.LG·November 4, 2024·1 cites

Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

Tianyu Chen, Zhendong Wang, Mingyuan Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces Diffusion Trusted Q-Learning (DTQL), a novel offline RL method that combines diffusion policies with a trust region approach to improve efficiency and performance without iterative sampling.

Contribution

The paper proposes DTQL, which eliminates iterative denoising in diffusion-based offline RL by using a dual policy and trust region loss, enhancing efficiency and effectiveness.

Findings

01

DTQL outperforms existing methods on D4RL benchmarks.

02

DTQL achieves faster training and inference speeds.

03

DTQL maintains high policy expressiveness and exploration capabilities.

Abstract

Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference. While several recent attempts have tried to accelerate diffusion-QL, the improvement in training and/or inference speed often results in degraded performance. In this paper, we introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy. We bridge the two polices by a newly introduced diffusion trust region loss. The diffusion policy maintains expressiveness, while the trust region loss directs the one-step policy to explore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tianyucodings/diffusion_trusted_q_learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Blockchain Technology Applications and Security

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Q-Learning · Diffusion