On the Convergence of Discounted Policy Gradient Methods
Chris Nota

TL;DR
This paper investigates the convergence properties of discounted policy gradient methods in reinforcement learning, demonstrating that with a slowly increasing discount factor, these methods can achieve standard gradient ascent guarantees.
Contribution
It provides a theoretical analysis showing that gradually increasing the discount factor allows discounted policy gradient methods to converge similarly to undiscounted gradient ascent.
Findings
Gradually increasing the discount factor leads to convergence guarantees.
The method recovers standard gradient ascent guarantees under specific conditions.
Provides insights into the behavior of biased policy gradient approximations.
Abstract
Many popular policy gradient methods for reinforcement learning follow a biased approximation of the policy gradient known as the discounted approximation. While it has been shown that the discounted approximation of the policy gradient is not the gradient of any objective function, little else is known about its convergence behavior or properties. In this paper, we show that if the discounted approximation is followed such that the discount factor is increased slowly at a rate related to a decreasing learning rate, the resulting method recovers the standard guarantees of gradient ascent on the undiscounted objective.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
