Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework

Yongxin Deng; Xihe Qiu; Jue Chen; Xiaoyu Tan

arXiv:2409.04744·cs.LG·May 21, 2025

Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework

Yongxin Deng, Xihe Qiu, Jue Chen, Xiaoyu Tan

PDF

Open Access

TL;DR

The paper introduces LMGT, a framework that uses large language models to guide reward tuning in reinforcement learning, improving sample efficiency and reducing computational costs especially in environments with sparse rewards.

Contribution

LMGT leverages LLMs' prior knowledge to balance exploration and exploitation in RL, a novel approach that enhances learning efficiency in sparse reward settings.

Findings

01

LMGT outperforms baseline methods across various RL tasks.

02

It significantly reduces computational resources during training.

03

Effective in robotic control environments like Housekeep.

Abstract

The inherent uncertainty in the environmental transition model of Reinforcement Learning (RL) necessitates a delicate balance between exploration and exploitation. This balance is crucial for optimizing computational resources to accurately estimate expected rewards for the agent. In scenarios with sparse rewards, such as robotic control systems, achieving this balance is particularly challenging. However, given that many environments possess extensive prior knowledge, learning from the ground up in such contexts may be redundant. To address this issue, we propose Language Model Guided reward Tuning (LMGT), a novel, sample-efficient framework. LMGT leverages the comprehensive prior knowledge embedded in Large Language Models (LLMs) and their proficiency in processing non-standard data forms, such as wiki tutorials. By utilizing LLM-guided reward shifts, LMGT adeptly balances exploration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling