Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
Zihan Ding, Chi Jin

TL;DR
This paper introduces the consistency policy, a fast and expressive policy class for reinforcement learning, leveraging consistency models to improve efficiency and performance across offline, offline-to-online, and online settings.
Contribution
The paper proposes the consistency policy as a novel, efficient policy representation based on consistency models, demonstrating its advantages over diffusion policies in RL tasks.
Findings
Consistency policy is more computationally efficient than diffusion policy.
It achieves comparable or higher performance in various RL settings.
Demonstrates effectiveness in offline, offline-to-online, and online RL scenarios.
Abstract
Score-based generative models like the diffusion model have been testified to be effective in modeling multi-modal data from image generation to reinforcement learning (RL). However, the inference process of diffusion model can be slow, which hinders its usage in RL with iterative sampling. We propose to apply the consistency model as an efficient yet expressive policy representation, namely consistency policy, with an actor-critic style algorithm for three typical RL settings: offline, offline-to-online and online. For offline RL, we demonstrate the expressiveness of generative models as policies from multi-modal data. For offline-to-online RL, the consistency policy is shown to be more computational efficient than diffusion policy, with a comparable performance. For online RL, the consistency policy demonstrates significant speedup and even higher average performances than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Language and cultural evolution
MethodsDiffusion
