All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL
Kai Arulkumaran, Dylan R. Ashley, J\"urgen Schmidhuber, Rupesh K., Srivastava

TL;DR
This paper introduces Upside Down Reinforcement Learning (UDRL), a supervised learning approach that unifies various RL paradigms, including imitation learning, offline RL, goal-conditioned RL, and meta-RL, using a single algorithm.
Contribution
The paper demonstrates that UDRL, previously used in online RL, can be extended to multiple RL settings with a general architecture, simplifying the learning process.
Findings
UDRL works effectively in imitation learning and offline RL.
A single UDRL agent can learn across multiple RL paradigms.
UDRL bypasses issues like bootstrapping and off-policy corrections.
Abstract
Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down, by taking returns as input and predicting actions. UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors. While previous work with UDRL demonstrated it in a traditional online RL setting, here we show that this single algorithm can also work in the imitation learning and offline RL settings, be extended to the goal-conditioned RL setting, and even the meta-RL setting. With a general agent architecture, a single UDRL agent can learn across all paradigms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Reinforcement Learning in Robotics
