Efficient Diffusion Policies for Offline Reinforcement Learning
Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, Shuicheng Yan

TL;DR
This paper introduces an efficient diffusion policy (EDP) for offline reinforcement learning that significantly reduces training time and improves compatibility with various algorithms, achieving state-of-the-art results on D4RL benchmarks.
Contribution
The paper proposes EDP, a novel method that accelerates diffusion policy training and enhances compatibility with maximum likelihood-based RL algorithms.
Findings
EDP reduces training time from 5 days to 5 hours on gym-locomotion tasks.
EDP achieves new state-of-the-art performance on D4RL benchmarks.
EDP is compatible with multiple offline RL algorithms like TD3, CRR, and IQL.
Abstract
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics · Muscle activation and electromyography studies
MethodsDiffusion
