Discovery of Options via Meta-Learned Subgoals
Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh,, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

TL;DR
This paper presents a meta-gradient method for discovering useful, diverse options in multi-task reinforcement learning, enabling agents to learn faster and adapt better to new tasks by automatically identifying subgoals.
Contribution
Introduces a novel meta-gradient approach for discovering options through interaction, using a manager-worker framework with neural network parameterized subgoal functions.
Findings
Discovered options are meaningful and diverse in multi-task RL.
Options are frequently used during training.
Discovered options accelerate learning in new tasks.
Abstract
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based on a manager-worker decomposition of the RL agent, in which a manager maximises rewards from the environment by learning a task-dependent policy over both a set of task-independent discovered-options and primitive actions. The option-reward and termination functions that define a subgoal for each option are parameterised as neural networks and trained via meta-gradients to maximise their usefulness. Empirical analysis on gridworld and DeepMind Lab tasks show that: (1) our approach can discover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
