Beyond the Policy Gradient Theorem for Efficient Policy Updates in   Actor-Critic Algorithms

Romain Laroche; Remi Tachet

arXiv:2202.07496·cs.LG·February 16, 2022

Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

Romain Laroche, Remi Tachet

PDF

Open Access

TL;DR

This paper introduces a novel policy update method in Actor-Critic algorithms that improves unlearning speed and guarantees convergence, addressing limitations of traditional policy gradient updates in reinforcement learning.

Contribution

It proposes a new policy update based on cross-entropy loss, proves its convergence to global optimality, and compares it analytically and empirically with standard methods.

Findings

01

The new update accelerates unlearning in policy optimization.

02

It guarantees convergence to the global optimum under standard assumptions.

03

Empirical results validate theoretical improvements over traditional policy gradients.

Abstract

In Reinforcement Learning, the optimal action at a given state is dependent on policy decisions at subsequent states. As a consequence, the learning targets evolve with time and the policy optimization process must be efficient at unlearning what it previously learnt. In this paper, we discover that the policy gradient theorem prescribes policy updates that are slow to unlearn because of their structural symmetry with respect to the value target. To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing $q$ , but find that such updates may lead to a decrease in value. Consequently, we introduce a modified policy update devoid of that flaw, and prove its guarantees of convergence to global optimality in $O (t^{- 1})$ under classic assumptions. Further, we assess standard policy updates and our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices