Off-Policy Actor-Critic

Thomas Degris; Martha White; Richard S. Sutton

arXiv:1205.4839·cs.LG·March 20, 2015·56 cites

Off-Policy Actor-Critic

Thomas Degris, Martha White, Richard S. Sutton

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first off-policy actor-critic algorithm that is online, scalable, and combines the advantages of off-policy learning with actor-critic methods, enabling practical reinforcement learning in large action spaces.

Contribution

It presents a novel incremental off-policy actor-critic algorithm with linear complexity, extending off-policy gradient methods to actor-critic frameworks.

Findings

01

Achieves better or comparable performance on benchmark problems

02

Proves convergence under standard assumptions

03

Scales linearly with the number of learned weights

Abstract

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning. Off-policy techniques, such as Greedy-GQ, enable a target policy to be learned while following and obtaining data from another (behavior) policy. For many problems, however, actor-critic methods are more practical than action value methods (like Greedy-GQ) because they explicitly represent the policy; consequently, the policy can be stochastic and utilize a large action space. In this paper, we illustrate how to practically combine the generality and learning potential of off-policy learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DevSlem/OfflineRL/blob/main/chap1_introduction.ipynb
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Optimization and Search Problems