RLZero: Direct Policy Inference from Language Without In-Domain Supervision
Harshit Sikchi, Siddhant Agarwal, Pranaya Jajoo, Samyak Parajuli, Caleb Chuck, Max Rudolph, Peter Stone, Amy Zhang, Scott Niekum

TL;DR
RLZero enables zero-shot policy inference from natural language instructions by imagining, projecting, and imitating observations without in-domain supervision, leveraging pretrained RL agents and generative models.
Contribution
It introduces RLZero, a novel framework that infers policies directly from language without task-specific supervision or labeled data, using a three-step process.
Findings
First zero-shot language-to-behavior generation across multiple tasks.
Effective policy imitation from imagined observations.
Ability to generate policies from cross-embodied videos like YouTube.
Abstract
The reward hypothesis states that all goals and purposes can be understood as the maximization of a received scalar reward signal. However, in practice, defining such a reward signal is notoriously difficult, as humans are often unable to predict the optimal behavior corresponding to a reward function. Natural language offers an intuitive alternative for instructing reinforcement learning (RL) agents, yet previous language-conditioned approaches either require costly supervision or test-time training given a language instruction. In this work, we present a new approach that uses a pretrained RL agent trained using only unlabeled, offline interactions--without task-specific supervision or labeled trajectories--to get zero-shot test-time policy inference from arbitrary natural language instructions. We introduce a framework comprising three steps: imagine, project, and imitate. First, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Text Readability and Simplification · Software Engineering Research
