Investigation on the generalization of the Sampled Policy Gradient   algorithm

Nil Stolt Ans\'o

arXiv:1910.03728·cs.LG·October 10, 2019

Investigation on the generalization of the Sampled Policy Gradient algorithm

Nil Stolt Ans\'o

PDF

Open Access

TL;DR

This paper investigates the generalization capabilities of the Sampled Policy Gradient (SPG) algorithm, comparing it with similar methods across various environments and configurations to assess its performance and potential advantages.

Contribution

It provides an empirical comparison of SPG with CACLA and DPG, highlighting its theoretical benefits and limitations in different settings.

Findings

01

SPG often performs better than some algorithms but does not consistently outperform the best methods.

02

Performance varies depending on environment and network architecture.

03

Further experiments are needed to fully understand SPG's strengths and weaknesses.

Abstract

The Sampled Policy Gradient (SPG) algorithm is a new offline actor-critic variant that samples in the action space to approximate the policy gradient. It does so by using the critic to evaluate the sampled actions. SPG offers theoretical promise over similar algorithms such as DPG as it searches the action-Q-value space independently of the local gradient, enabling it to avoid local minima. This paper aims to compare SPG to two similar actor-critic algorithms, CACLA and DPG. The comparison is made across two different environments, two different network architectures, as well as training on on-policy transitions in contrast to using an experience buffer. Results seem to show that although SPG does often not perform the worst, it doesn't always match the performance of the best performing algorithm at a particular task. Further experiments are required to get a better estimate of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks

MethodsDeterministic Policy Gradient