Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang, Yang Yu, Deheng, Ye

TL;DR
This paper presents a framework that leverages large language models to extract environment background knowledge, improving sample efficiency in reinforcement learning across various tasks by using knowledge-based reward shaping.
Contribution
It introduces a novel method to extract environment background knowledge from LLMs and applies it to enhance RL sample efficiency through potential-based reward shaping.
Findings
Significant sample efficiency improvements in Minigrid and Crafter tasks.
Effective knowledge extraction from LLMs via different prompting methods.
Knowledge-based reward shaping maintains policy optimality.
Abstract
Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this paper, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation. We ground LLMs by feeding a few pre-collected experiences and requesting them to delineate background knowledge of the environment. Afterward, we represent the output knowledge as potential functions for potential-based reward shaping, which has a good property for maintaining policy optimality from task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Reinforcement Learning in Robotics
