Unsupervised Meta-Learning for Reinforcement Learning
Abhishek Gupta, Benjamin Eysenbach, Chelsea Finn, Sergey Levine

TL;DR
This paper introduces unsupervised meta-learning algorithms for reinforcement learning that automate task design, enabling faster learning without manual task specification and outperforming learning from scratch.
Contribution
It formulates the unsupervised meta-reinforcement learning problem and proposes a method using mutual information for task proposal to train effective meta-learners.
Findings
Unsupervised meta-learning effectively accelerates reinforcement learning.
Proposed methods outperform learning from scratch.
Automates task design in meta-reinforcement learning.
Abstract
Meta-learning algorithms use past experience to learn to quickly solve new tasks. In the context of reinforcement learning, meta-learning algorithms acquire reinforcement learning procedures to solve new problems more efficiently by utilizing experience from prior tasks. The performance of meta-learning algorithms depends on the tasks available for meta-training: in the same way that supervised learning generalizes best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
