Multi-Level Discovery of Deep Options

Roy Fox; Sanjay Krishnan; Ion Stoica; Ken Goldberg

arXiv:1703.08294·cs.LG·October 6, 2017·70 cites

Multi-Level Discovery of Deep Options

Roy Fox, Sanjay Krishnan, Ion Stoica, Ken Goldberg

PDF

Open Access

TL;DR

This paper introduces DDO, a scalable policy-gradient method for automatically discovering deep options in hierarchical reinforcement learning, which improves exploration and learning efficiency in complex environments.

Contribution

The paper presents a novel recursive approach for discovering multi-level deep options from demonstrations, enabling scalable hierarchical reinforcement learning.

Findings

01

Accelerates learning in 4 out of 5 Atari environments

02

Discovers structure in surgical videos matching expert annotations with 72% accuracy

03

Effective in multi-level hierarchies through decoupled discovery and control policies

Abstract

Augmenting an agent's control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces. While recent work has proposed several techniques for automated option discovery, they do not scale to multi-level hierarchies and to expressive representations such as deep networks. We present Discovery of Deep Options (DDO), a policy-gradient algorithm that discovers parametrized options from a set of demonstration trajectories, and can be used recursively to discover additional levels of the hierarchy. The scalability of our approach to multi-level hierarchies stems from the decoupling of low-level option discovery from high-level meta-control policy learning, facilitated by under-parametrization of the high level. We demonstrate that using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Reservoir Engineering and Simulation Methods