Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications
Carson Eisenach, Haichuan Yang, Ji Liu, Han Liu

TL;DR
This paper introduces the angular policy gradient (APG), a new variance-reduced policy gradient method for directional control in reinforcement learning, providing a unified analysis with existing estimators and demonstrating improved performance in experiments.
Contribution
It presents the APG estimator for directional control, unifies variance reduction analysis for policy gradients, and extends variance reduction techniques to new action spaces.
Findings
APG significantly reduces variance compared to standard policy gradients.
Experimental results show APG outperforms existing methods in RTS game and navigation tasks.
Unified analysis offers stronger guarantees for variance reduction in policy gradient estimators.
Abstract
Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, an agent learns a policy over and in the latter, over a discrete set of actions each of which is parametrized by a continuous parameter. Such problems are naturally solved using policy based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading to instability and slow convergence. Unnecessary variance is introduced whenever policies over bounded action spaces are modeled using distributions with unbounded support by applying a transformation to the sampled action before execution in the environment. Recently, the variance reduced clipped action policy gradient (CAPG) was introduced for actions in bounded intervals, but to date no variance reduced methods exist when the action is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
