Off-Policy Actor-Critic with Emphatic Weightings

Eric Graves; Ehsan Imani; Raksha Kumaraswamy; Martha White

arXiv:2111.08172·cs.LG·April 17, 2023·1 cites

Off-Policy Actor-Critic with Emphatic Weightings

Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White

PDF

Open Access 1 Repo

TL;DR

This paper introduces ACE, a new off-policy actor-critic algorithm that uses emphatic weightings to ensure convergence to the optimal policy, addressing limitations of previous semi-gradient methods.

Contribution

The paper derives a unified off-policy policy gradient theorem using emphatic weightings and interest functions, and proposes the ACE algorithm with proven convergence properties.

Findings

01

ACE outperforms previous methods like OffPAC in experiments.

02

Direct approximation of emphatic weightings improves stability and performance.

03

ACE converges to the optimal solution in tested environments.

Abstract

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient. The off-policy setting, however, has been less clear due to the existence of multiple objectives and the lack of an explicit off-policy policy gradient theorem. In this work, we unify these objectives into one off-policy objective, and provide a policy gradient theorem for this unified objective. The derivation involves emphatic weightings and interest functions. We show multiple strategies to approximate the gradients, in an algorithm called Actor Critic with Emphatic weightings (ACE). We prove in a counterexample that previous (semi-gradient) off-policy actor-critic methods--particularly Off-Policy Actor-Critic (OffPAC) and Deterministic Policy Gradient (DPG)--converge to the wrong solution whereas ACE finds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gravesec/actor-critic-with-emphatic-weightings
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning