Hierarchical Approaches for Reinforcement Learning in Parameterized   Action Space

Ermo Wei; Drew Wicke; Sean Luke

arXiv:1810.09656·cs.LG·October 24, 2018·6 cites

Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

Ermo Wei, Drew Wicke, Sean Luke

PDF

Open Access

TL;DR

This paper introduces a hierarchical deep reinforcement learning architecture for parameterized action spaces, utilizing novel training methods based on TRPO and SVG that outperform existing approaches.

Contribution

It proposes a new compact architecture conditioned on discrete actions and introduces two training methods that improve sample efficiency and performance.

Findings

01

Outperforms Parameterized Action DDPG on test domains

02

Effective end-to-end training in parameterized action spaces

03

Hierarchical architecture improves learning efficiency

Abstract

We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsWeight Decay · Convolution · Adam · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Deep Deterministic Policy Gradient