Policy Gradient Methods for Reinforcement Learning with Function   Approximation and Action-Dependent Baselines

Philip S. Thomas; Emma Brunskill

arXiv:1706.06643·cs.AI·June 22, 2017·43 cites

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

Philip S. Thomas, Emma Brunskill

PDF

Open Access

TL;DR

This paper extends the policy gradient theorem to include action-dependent baselines with function approximation, enhancing reinforcement learning methods by reducing variance in gradient estimates.

Contribution

It introduces a way to incorporate action-dependent baselines into policy gradient methods with function approximation, building on prior work with action-independent baselines.

Findings

01

Action-dependent baselines can be integrated into policy gradients.

02

The method reduces variance in gradient estimates.

03

It generalizes previous policy gradient approaches.

Abstract

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Traffic control and management · Adaptive Dynamic Programming Control