RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

Yan Duan; John Schulman; Xi Chen; Peter L. Bartlett; Ilya Sutskever,; Pieter Abbeel

arXiv:1611.02779·cs.AI·November 11, 2016·501 cites

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever,, Pieter Abbeel

PDF

Open Access 5 Repos

TL;DR

RL$^2$ introduces a recurrent neural network-based approach to fast reinforcement learning, trained via a slow RL process, enabling quick adaptation to new tasks with performance comparable to traditional algorithms.

Contribution

The paper proposes RL$^2$, a novel method that encodes fast RL algorithms within RNNs trained through slow RL, bridging the gap between animal-like quick learning and deep RL.

Findings

01

RL$^2$ performs well on small-scale problems like multi-arm bandits and finite MDPs.

02

RL$^2$ achieves near-optimal performance on unseen MDPs after training.

03

The approach scales to high-dimensional, vision-based navigation tasks.

Abstract

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL $^{2}$ , the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Neural Network Applications