Cautious Actor-Critic
Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara

TL;DR
This paper introduces Cautious Actor-Critic (CAC), a new off-policy reinforcement learning algorithm that enhances stability by combining conservative policy and value updates, suitable for stability-critical applications.
Contribution
The paper proposes a novel off-policy actor-critic algorithm that integrates conservative policy iteration and entropy-regularized value updates for improved stability.
Findings
CAC achieves comparable performance to state-of-the-art methods.
CAC significantly stabilizes learning in continuous control tasks.
The entropy-regularized critic simplifies the actor update process.
Abstract
The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
