Marginal Policy Gradients: A Unified Family of Estimators for Bounded   Action Spaces with Applications

Carson Eisenach; Haichuan Yang; Ji Liu; Han Liu

arXiv:1806.05134·cs.LG·February 19, 2019·5 cites

Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications

Carson Eisenach, Haichuan Yang, Ji Liu, Han Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces the angular policy gradient (APG), a new variance-reduced policy gradient method for directional control in reinforcement learning, providing a unified analysis with existing estimators and demonstrating improved performance in experiments.

Contribution

It presents the APG estimator for directional control, unifies variance reduction analysis for policy gradients, and extends variance reduction techniques to new action spaces.

Findings

01

APG significantly reduces variance compared to standard policy gradients.

02

Experimental results show APG outperforms existing methods in RTS game and navigation tasks.

03

Unified analysis offers stronger guarantees for variance reduction in policy gradient estimators.

Abstract

Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, an agent learns a policy over $R^{d}$ and in the latter, over a discrete set of actions each of which is parametrized by a continuous parameter. Such problems are naturally solved using policy based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading to instability and slow convergence. Unnecessary variance is introduced whenever policies over bounded action spaces are modeled using distributions with unbounded support by applying a transformation $T$ to the sampled action before execution in the environment. Recently, the variance reduced clipped action policy gradient (CAPG) was introduced for actions in bounded intervals, but to date no variance reduced methods exist when the action is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ceisenach/MPG
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning