On the Convergence of Discounted Policy Gradient Methods

Chris Nota

arXiv:2212.14066·cs.LG·January 10, 2023

On the Convergence of Discounted Policy Gradient Methods

Chris Nota

PDF

Open Access

TL;DR

This paper investigates the convergence properties of discounted policy gradient methods in reinforcement learning, demonstrating that with a slowly increasing discount factor, these methods can achieve standard gradient ascent guarantees.

Contribution

It provides a theoretical analysis showing that gradually increasing the discount factor allows discounted policy gradient methods to converge similarly to undiscounted gradient ascent.

Findings

01

Gradually increasing the discount factor leads to convergence guarantees.

02

The method recovers standard gradient ascent guarantees under specific conditions.

03

Provides insights into the behavior of biased policy gradient approximations.

Abstract

Many popular policy gradient methods for reinforcement learning follow a biased approximation of the policy gradient known as the discounted approximation. While it has been shown that the discounted approximation of the policy gradient is not the gradient of any objective function, little else is known about its convergence behavior or properties. In this paper, we show that if the discounted approximation is followed such that the discount factor is increased slowly at a rate related to a decreasing learning rate, the resulting method recovers the standard guarantees of gradient ascent on the undiscounted objective.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics