Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space
Ermo Wei, Drew Wicke, Sean Luke

TL;DR
This paper introduces a hierarchical deep reinforcement learning architecture for parameterized action spaces, utilizing novel training methods based on TRPO and SVG that outperform existing approaches.
Contribution
It proposes a new compact architecture conditioned on discrete actions and introduces two training methods that improve sample efficiency and performance.
Findings
Outperforms Parameterized Action DDPG on test domains
Effective end-to-end training in parameterized action spaces
Hierarchical architecture improves learning efficiency
Abstract
We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsWeight Decay · Convolution · Adam · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Deep Deterministic Policy Gradient
