Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization
Haoran Li, Zhennan Jiang, Yuhui Chen, Dongbin Zhao

TL;DR
This paper introduces CP3ER, a novel method applying consistency models to visual reinforcement learning, enhancing sample efficiency and training stability in high-dimensional visual tasks, and achieving state-of-the-art results.
Contribution
It is the first to extend diffusion/consistency models to visual RL, proposing prioritized proximal experience regularization for improved stability and efficiency.
Findings
CP3ER outperforms previous methods on 21 tasks.
Sample-based entropy regularization stabilizes training.
Achieves state-of-the-art performance in visual RL benchmarks.
Abstract
With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsOnline Learning and Analytics
MethodsConsistency Models · Entropy Regularization · Diffusion
