Discovery of Options via Meta-Learned Subgoals

Vivek Veeriah; Tom Zahavy; Matteo Hessel; Zhongwen Xu; Junhyuk Oh,; Iurii Kemaev; Hado van Hasselt; David Silver; Satinder Singh

arXiv:2102.06741·cs.LG·February 16, 2021·5 cites

Discovery of Options via Meta-Learned Subgoals

Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh,, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

PDF

Open Access 1 Video

TL;DR

This paper presents a meta-gradient method for discovering useful, diverse options in multi-task reinforcement learning, enabling agents to learn faster and adapt better to new tasks by automatically identifying subgoals.

Contribution

Introduces a novel meta-gradient approach for discovering options through interaction, using a manager-worker framework with neural network parameterized subgoal functions.

Findings

01

Discovered options are meaningful and diverse in multi-task RL.

02

Options are frequently used during training.

03

Discovered options accelerate learning in new tasks.

Abstract

Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based on a manager-worker decomposition of the RL agent, in which a manager maximises rewards from the environment by learning a task-dependent policy over both a set of task-independent discovered-options and primitive actions. The option-reward and termination functions that define a subgoal for each option are parameterised as neural networks and trained via meta-gradients to maximise their usefulness. Empirical analysis on gridworld and DeepMind Lab tasks show that: (1) our approach can discover…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Discovery of Options via Meta-Learned Subgoals· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques