Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines
Philip S. Thomas, Emma Brunskill

TL;DR
This paper extends the policy gradient theorem to include action-dependent baselines with function approximation, enhancing reinforcement learning methods by reducing variance in gradient estimates.
Contribution
It introduces a way to incorporate action-dependent baselines into policy gradient methods with function approximation, building on prior work with action-independent baselines.
Findings
Action-dependent baselines can be integrated into policy gradients.
The method reduces variance in gradient estimates.
It generalizes previous policy gradient approaches.
Abstract
We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Traffic control and management · Adaptive Dynamic Programming Control
