Upside Down Reinforcement Learning with Policy Generators

Jacopo Di Ventura; Dylan R. Ashley; Vincent Herrmann; Francesco; Faccio; J\"urgen Schmidhuber

arXiv:2501.16288·cs.LG·January 29, 2025

Upside Down Reinforcement Learning with Policy Generators

Jacopo Di Ventura, Dylan R. Ashley, Vincent Herrmann, Francesco, Faccio, J\"urgen Schmidhuber

PDF

Open Access 1 Repo

TL;DR

This paper introduces UDRLPG, a novel method that uses hypernetworks to generate policies conditioned on commands, improving sample efficiency and enabling zero-shot generalization in reinforcement learning tasks.

Contribution

The paper extends Upside Down Reinforcement Learning by integrating hypernetworks to generate command-conditioned policies without requiring an evaluator, enhancing efficiency and generalization.

Findings

01

Achieves competitive performance and high returns in RL tasks.

02

Can generalize to unseen returns zero-shot.

03

Improves empirical convergence despite increased variance.

Abstract

Upside Down Reinforcement Learning (UDRL) is a promising framework for solving reinforcement learning problems which focuses on learning command-conditioned policies. In this work, we extend UDRL to the task of learning a command-conditioned generator of deep neural network policies. We accomplish this using Hypernetworks - a variant of Fast Weight Programmers, which learn to decode input commands representing a desired expected return into command-specific weight matrices. Our method, dubbed Upside Down Reinforcement Learning with Policy Generators (UDRLPG), streamlines comparable techniques by removing the need for an evaluator or critic to update the weights of the generator. To counteract the increased variance in last returns caused by not having an evaluator, we decouple the sampling probability of the buffer from the absolute number of policies in it, which, together with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jacopod/udrlpg_
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management