Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche, Remi Tachet

TL;DR
This paper introduces a novel policy update method in Actor-Critic algorithms that improves unlearning speed and guarantees convergence, addressing limitations of traditional policy gradient updates in reinforcement learning.
Contribution
It proposes a new policy update based on cross-entropy loss, proves its convergence to global optimality, and compares it analytically and empirically with standard methods.
Findings
The new update accelerates unlearning in policy optimization.
It guarantees convergence to the global optimum under standard assumptions.
Empirical results validate theoretical improvements over traditional policy gradients.
Abstract
In Reinforcement Learning, the optimal action at a given state is dependent on policy decisions at subsequent states. As a consequence, the learning targets evolve with time and the policy optimization process must be efficient at unlearning what it previously learnt. In this paper, we discover that the policy gradient theorem prescribes policy updates that are slow to unlearn because of their structural symmetry with respect to the value target. To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing , but find that such updates may lead to a decrease in value. Consequently, we introduce a modified policy update devoid of that flaw, and prove its guarantees of convergence to global optimality in under classic assumptions. Further, we assess standard policy updates and our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
