Diffusion Policies creating a Trust Region for Offline Reinforcement Learning
Tianyu Chen, Zhendong Wang, Mingyuan Zhou

TL;DR
This paper introduces Diffusion Trusted Q-Learning (DTQL), a novel offline RL method that combines diffusion policies with a trust region approach to improve efficiency and performance without iterative sampling.
Contribution
The paper proposes DTQL, which eliminates iterative denoising in diffusion-based offline RL by using a dual policy and trust region loss, enhancing efficiency and effectiveness.
Findings
DTQL outperforms existing methods on D4RL benchmarks.
DTQL achieves faster training and inference speeds.
DTQL maintains high policy expressiveness and exploration capabilities.
Abstract
Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference. While several recent attempts have tried to accelerate diffusion-QL, the improvement in training and/or inference speed often results in degraded performance. In this paper, we introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy. We bridge the two polices by a newly introduced diffusion trust region loss. The diffusion policy maintains expressiveness, while the trust region loss directs the one-step policy to explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Blockchain Technology Applications and Security
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Q-Learning · Diffusion
