Tackling the Zero-Shot Reinforcement Learning Loss Directly
Yann Ollivier

TL;DR
This paper demonstrates that the zero-shot reinforcement learning loss can be directly optimized for various priors, revealing insights into existing methods like VISR and highlighting limitations in feature diversity.
Contribution
It proves direct optimization of the zero-shot RL loss for multiple priors, connecting it to existing approaches and analyzing its implications.
Findings
White noise prior leads to an objective similar to VISR
Some priors produce narrow optimal features
Direct optimization offers new insights into zero-shot RL
Abstract
Zero-shot reinforcement learning (RL) methods aim at instantly producing a behavior for an RL task in a given environment, from a description of the reward function. These methods are usually tested by evaluating their average performance on a series of downstream tasks. Yet they cannot be trained directly for that objective, unless the distribution of downstream tasks is known. Existing approaches either use other learning criteria [BBQ+ 18, TRO23, TO21, HDB+ 19], or explicitly set a prior on downstream tasks, such as reward functions given by a random neural network [FPAL24]. Here we prove that the zero-shot RL loss can be optimized directly, for a range of non-informative priors such as white noise rewards, temporally smooth rewards, ``scattered'' sparse rewards, or a combination of those. Thus, it is possible to learn the optimal zero-shot features algorithmically, for a wide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeuroscience and Neural Engineering
MethodsSparse Evolutionary Training
