RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever,, Pieter Abbeel

TL;DR
RL$^2$ introduces a recurrent neural network-based approach to fast reinforcement learning, trained via a slow RL process, enabling quick adaptation to new tasks with performance comparable to traditional algorithms.
Contribution
The paper proposes RL$^2$, a novel method that encodes fast RL algorithms within RNNs trained through slow RL, bridging the gap between animal-like quick learning and deep RL.
Findings
RL$^2$ performs well on small-scale problems like multi-arm bandits and finite MDPs.
RL$^2$ achieves near-optimal performance on unseen MDPs after training.
The approach scales to high-dimensional, vision-based navigation tasks.
Abstract
Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Neural Network Applications
