Upside Down Reinforcement Learning with Policy Generators
Jacopo Di Ventura, Dylan R. Ashley, Vincent Herrmann, Francesco, Faccio, J\"urgen Schmidhuber

TL;DR
This paper introduces UDRLPG, a novel method that uses hypernetworks to generate policies conditioned on commands, improving sample efficiency and enabling zero-shot generalization in reinforcement learning tasks.
Contribution
The paper extends Upside Down Reinforcement Learning by integrating hypernetworks to generate command-conditioned policies without requiring an evaluator, enhancing efficiency and generalization.
Findings
Achieves competitive performance and high returns in RL tasks.
Can generalize to unseen returns zero-shot.
Improves empirical convergence despite increased variance.
Abstract
Upside Down Reinforcement Learning (UDRL) is a promising framework for solving reinforcement learning problems which focuses on learning command-conditioned policies. In this work, we extend UDRL to the task of learning a command-conditioned generator of deep neural network policies. We accomplish this using Hypernetworks - a variant of Fast Weight Programmers, which learn to decode input commands representing a desired expected return into command-specific weight matrices. Our method, dubbed Upside Down Reinforcement Learning with Policy Generators (UDRLPG), streamlines comparable techniques by removing the need for an evaluator or critic to update the weights of the generator. To counteract the increased variance in last returns caused by not having an evaluator, we decouple the sampling probability of the buffer from the absolute number of policies in it, which, together with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management
