Reusable Options through Gradient-based Meta Learning
David Kuric, Herke van Hoof

TL;DR
This paper introduces a gradient-based meta-learning approach to develop reusable options in hierarchical reinforcement learning, enabling faster adaptation across tasks and outperforming existing methods.
Contribution
It formulates the learning of options as a gradient-based meta-learning problem, addressing shortcomings of prior approaches and improving transferability and learning speed.
Findings
Learned options are transferable and accelerate learning.
Proposed method outperforms existing approaches.
Ablation studies confirm the effectiveness of meta-learning components.
Abstract
Hierarchical methods in reinforcement learning have the potential to reduce the amount of decisions that the agent needs to perform when learning new tasks. However, finding reusable useful temporal abstractions that facilitate fast learning remains a challenging problem. Recently, several deep learning approaches were proposed to learn such temporal abstractions in the form of options in an end-to-end manner. In this work, we point out several shortcomings of these methods and discuss their potential negative consequences. Subsequently, we formulate the desiderata for reusable options and use these to frame the problem of learning options as a gradient-based meta-learning problem. This allows us to formulate an objective that explicitly incentivizes options which allow a higher-level decision maker to adjust in few steps to different tasks. Experimentally, we show that our method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Explainable Artificial Intelligence (XAI)
