Training Agents using Upside-Down Reinforcement Learning
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech, Ja\'skowski, J\"urgen Schmidhuber

TL;DR
Upside-Down Reinforcement Learning (UDRL) trains agents using supervised learning to follow commands, avoiding reward prediction and policy search, and achieves competitive results across various environments.
Contribution
This paper introduces UDRL, a simple, reward-free approach to training agents that can outperform some traditional reinforcement learning algorithms.
Findings
UDRL performs competitively on several tasks.
UDRL can surpass traditional algorithms in some environments.
The approach offers a new perspective on agent training.
Abstract
We develop Upside-Down Reinforcement Learning (UDRL), a method for learning to act using only supervised learning techniques. Unlike traditional algorithms, UDRL does not use reward prediction or search for an optimal policy. Instead, it trains agents to follow commands such as "obtain so much total reward in so much time." Many of its general principles are outlined in a companion report; the goal of this paper is to develop a practical learning algorithm and show that this conceptually simple perspective on agent training can produce a range of rewarding behaviors for multiple episodic environments. Experiments show that on some tasks UDRL's performance can be surprisingly competitive with, and even exceed that of some traditional baseline algorithms developed over decades of research. Based on these results, we suggest that alternative approaches to expected reward maximization have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Data Stream Mining Techniques
