Boosting Continuous Control with Consistency Policy

Yuhui Chen; Haoran Li; Dongbin Zhao

arXiv:2310.06343·cs.LG·January 25, 2024·1 cites

Boosting Continuous Control with Consistency Policy

Yuhui Chen, Haoran Li, Dongbin Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces CPQL, a novel method that enhances diffusion model-based offline and online reinforcement learning by enabling single-step action derivation, significantly improving efficiency and guidance accuracy.

Contribution

The paper proposes CPQL, a time-efficient policy method that maps reverse diffusion trajectories to actions, addressing speed and guidance issues in diffusion-based reinforcement learning.

Findings

01

Achieves state-of-the-art performance on 11 offline tasks.

02

Improves inference speed by nearly 45 times.

03

Seamlessly extends to online RL tasks.

Abstract

Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion-model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cccedric/cpql
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Q-Learning