Meta-Reinforcement Learning of Structured Exploration Strategies

Abhishek Gupta; Russell Mendonca; YuXuan Liu; Pieter Abbeel; Sergey; Levine

arXiv:1802.07245·cs.LG·February 21, 2018·180 cites

Meta-Reinforcement Learning of Structured Exploration Strategies

Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey, Levine

PDF

Open Access 2 Repos

TL;DR

This paper introduces MAESN, a meta-reinforcement learning algorithm that learns structured exploration strategies from prior tasks, significantly improving exploration efficiency in various simulated robotics tasks.

Contribution

The paper presents MAESN, a novel gradient-based meta-RL method that learns structured exploration strategies, outperforming prior meta-RL and task-agnostic exploration approaches.

Findings

01

MAESN outperforms prior meta-RL methods in exploration tasks.

02

Structured stochasticity improves exploration effectiveness.

03

Method is validated on diverse simulated robotics tasks.

Abstract

Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research