Boosting Continuous Control with Consistency Policy
Yuhui Chen, Haoran Li, Dongbin Zhao

TL;DR
This paper introduces CPQL, a novel method that enhances diffusion model-based offline and online reinforcement learning by enabling single-step action derivation, significantly improving efficiency and guidance accuracy.
Contribution
The paper proposes CPQL, a time-efficient policy method that maps reverse diffusion trajectories to actions, addressing speed and guidance issues in diffusion-based reinforcement learning.
Findings
Achieves state-of-the-art performance on 11 offline tasks.
Improves inference speed by nearly 45 times.
Seamlessly extends to online RL tasks.
Abstract
Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Q-Learning
