Convergent Actor-Critic Algorithms Under Off-Policy Training and   Function Approximation

Hamid Reza Maei

arXiv:1802.07842·cs.AI·February 23, 2018·23 cites

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

Hamid Reza Maei

PDF

Open Access

TL;DR

This paper introduces convergent off-policy Actor-Critic algorithms that utilize state-value functions, enabling effective policy learning in high-dimensional or continuous action spaces with guaranteed convergence.

Contribution

It presents the first convergent off-policy Actor-Critic algorithms using function approximation, extending classical methods with theoretical guarantees.

Findings

01

Algorithms guarantee convergence to the optimal policy.

02

Applicable to large and continuous action spaces.

03

Maintain desirable properties of classical Actor-Critic methods.

Abstract

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to the-curse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-action value functions (Q functions). Using state-value functions helps to lift the curse and as a result naturally turn our policy-gradient solution into classical Actor-Critic architecture whose Actor uses state-value function for the update. Our algorithms, Gradient Actor-Critic and Emphatic Actor-Critic, are derived based on the exact gradient of averaged state-value function objective and thus are guaranteed to converge to its optimal solution, while maintaining all the desirable properties…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Adversarial Robustness in Machine Learning