The Reinforce Policy Gradient Algorithm Revisited
Shalabh Bhatnagar

TL;DR
This paper revisits the Reinforce policy gradient algorithm, proposing a major enhancement using function measurement over perturbed parameters, which improves applicability to systems with infinite state and action spaces and includes a convergence proof.
Contribution
It introduces a new variant of the Reinforce algorithm that estimates gradients via function measurement and random search, relaxing regularity conditions and providing convergence guarantees.
Findings
The enhanced algorithm converges to a neighborhood of a local minimum.
The method is applicable to systems with infinite state and action spaces.
A formal proof of convergence is provided.
Abstract
We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Age of Information Optimization · Reinforcement Learning in Robotics
MethodsRandom Search
