Guiding Pretraining in Reinforcement Learning with Large Language Models
Yuqing Du, Olivia Watkins, Zihan Wang, C\'edric Colas, Trevor Darrell,, Pieter Abbeel, Abhishek Gupta, Jacob Andreas

TL;DR
This paper introduces ELLM, a method that leverages large language models to guide reinforcement learning exploration by rewarding agents for achieving goals suggested by text-based prompts, improving coverage of meaningful behaviors.
Contribution
The paper presents a novel approach that uses large-scale language model pretraining to shape exploration in reinforcement learning without human intervention.
Findings
ELLM improves coverage of common-sense behaviors during pretraining.
ELLM-trained agents match or outperform baseline methods on downstream tasks.
The approach effectively guides agents toward human-meaningful behaviors.
Abstract
Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent's current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
