Improving Sample Efficiency of Reinforcement Learning with Background   Knowledge from Large Language Models

Fuxiang Zhang; Junyou Li; Yi-Chen Li; Zongzhang Zhang; Yang Yu; Deheng; Ye

arXiv:2407.03964·cs.CL·July 8, 2024

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang, Yang Yu, Deheng, Ye

PDF

Open Access 1 Repo

TL;DR

This paper presents a framework that leverages large language models to extract environment background knowledge, improving sample efficiency in reinforcement learning across various tasks by using knowledge-based reward shaping.

Contribution

It introduces a novel method to extract environment background knowledge from LLMs and applies it to enhance RL sample efficiency through potential-based reward shaping.

Findings

01

Significant sample efficiency improvements in Minigrid and Crafter tasks.

02

Effective knowledge extraction from LLMs via different prompting methods.

03

Knowledge-based reward shaping maintains policy optimality.

Abstract

Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this paper, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation. We ground LLMs by feeding a few pre-collected experiences and requesting them to delineate background knowledge of the environment. Afterward, we represent the output knowledge as potential functions for potential-based reward shaping, which has a good property for maintaining policy optimality from task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mansicer/background-knowledge-rl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Reinforcement Learning in Robotics