The Reinforce Policy Gradient Algorithm Revisited

Shalabh Bhatnagar

arXiv:2310.05000·cs.LG·October 10, 2023

The Reinforce Policy Gradient Algorithm Revisited

Shalabh Bhatnagar

PDF

Open Access

TL;DR

This paper revisits the Reinforce policy gradient algorithm, proposing a major enhancement using function measurement over perturbed parameters, which improves applicability to systems with infinite state and action spaces and includes a convergence proof.

Contribution

It introduces a new variant of the Reinforce algorithm that estimates gradients via function measurement and random search, relaxing regularity conditions and providing convergence guarantees.

Findings

01

The enhanced algorithm converges to a neighborhood of a local minimum.

02

The method is applicable to systems with infinite state and action spaces.

03

A formal proof of convergence is provided.

Abstract

We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Age of Information Optimization · Reinforcement Learning in Robotics

MethodsRandom Search