Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework
Yongxin Deng, Xihe Qiu, Jue Chen, Xiaoyu Tan

TL;DR
The paper introduces LMGT, a framework that uses large language models to guide reward tuning in reinforcement learning, improving sample efficiency and reducing computational costs especially in environments with sparse rewards.
Contribution
LMGT leverages LLMs' prior knowledge to balance exploration and exploitation in RL, a novel approach that enhances learning efficiency in sparse reward settings.
Findings
LMGT outperforms baseline methods across various RL tasks.
It significantly reduces computational resources during training.
Effective in robotic control environments like Housekeep.
Abstract
The inherent uncertainty in the environmental transition model of Reinforcement Learning (RL) necessitates a delicate balance between exploration and exploitation. This balance is crucial for optimizing computational resources to accurately estimate expected rewards for the agent. In scenarios with sparse rewards, such as robotic control systems, achieving this balance is particularly challenging. However, given that many environments possess extensive prior knowledge, learning from the ground up in such contexts may be redundant. To address this issue, we propose Language Model Guided reward Tuning (LMGT), a novel, sample-efficient framework. LMGT leverages the comprehensive prior knowledge embedded in Large Language Models (LLMs) and their proficiency in processing non-standard data forms, such as wiki tutorials. By utilizing LLM-guided reward shifts, LMGT adeptly balances exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling
