Generalizing Consistency Policy to Visual RL with Prioritized Proximal   Experience Regularization

Haoran Li; Zhennan Jiang; Yuhui Chen; Dongbin Zhao

arXiv:2410.00051·cs.LG·October 30, 2024

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Haoran Li, Zhennan Jiang, Yuhui Chen, Dongbin Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces CP3ER, a novel method applying consistency models to visual reinforcement learning, enhancing sample efficiency and training stability in high-dimensional visual tasks, and achieving state-of-the-art results.

Contribution

It is the first to extend diffusion/consistency models to visual RL, proposing prioritized proximal experience regularization for improved stability and efficiency.

Findings

01

CP3ER outperforms previous methods on 21 tasks.

02

Sample-based entropy regularization stabilizes training.

03

Achieves state-of-the-art performance in visual RL benchmarks.

Abstract

With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization· slideslive

Taxonomy

TopicsOnline Learning and Analytics

MethodsConsistency Models · Entropy Regularization · Diffusion