Cautious Actor-Critic

Lingwei Zhu; Toshinori Kitamura; Takamitsu Matsubara

arXiv:2107.05217·cs.LG·October 6, 2021

Cautious Actor-Critic

Lingwei Zhu, Toshinori Kitamura, Takamitsu Matsubara

PDF

Open Access

TL;DR

This paper introduces Cautious Actor-Critic (CAC), a new off-policy reinforcement learning algorithm that enhances stability by combining conservative policy and value updates, suitable for stability-critical applications.

Contribution

The paper proposes a novel off-policy actor-critic algorithm that integrates conservative policy iteration and entropy-regularized value updates for improved stability.

Findings

01

CAC achieves comparable performance to state-of-the-art methods.

02

CAC significantly stabilizes learning in continuous control tasks.

03

The entropy-regularized critic simplifies the actor update process.

Abstract

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control