Efficient Diffusion Policies for Offline Reinforcement Learning

Bingyi Kang; Xiao Ma; Chao Du; Tianyu Pang; Shuicheng Yan

arXiv:2305.20081·cs.LG·October 27, 2023·5 cites

Efficient Diffusion Policies for Offline Reinforcement Learning

Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, Shuicheng Yan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an efficient diffusion policy (EDP) for offline reinforcement learning that significantly reduces training time and improves compatibility with various algorithms, achieving state-of-the-art results on D4RL benchmarks.

Contribution

The paper proposes EDP, a novel method that accelerates diffusion policy training and enhances compatibility with maximum likelihood-based RL algorithms.

Findings

01

EDP reduces training time from 5 days to 5 hours on gym-locomotion tasks.

02

EDP achieves new state-of-the-art performance on D4RL benchmarks.

03

EDP is compatible with multiple offline RL algorithms like TD3, CRR, and IQL.

Abstract

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sail-sg/edp
jaxOfficial

Videos

Efficient Diffusion Policies For Offline Reinforcement Learning· slideslive

Taxonomy

TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics · Muscle activation and electromyography studies

MethodsDiffusion