LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models
Qianyue Hao, Yiwen Song, Qingmin Liao, Jian Yuan, Yong Li

TL;DR
This paper introduces LLM-Explorer, a plug-in module that leverages large language models to generate adaptive, task-specific exploration strategies in reinforcement learning, significantly improving performance on benchmark tasks.
Contribution
The paper presents a novel method using LLMs to dynamically generate and update exploration strategies in RL, tailored to each task and learning stage, which is a significant advancement over traditional fixed stochastic processes.
Findings
Achieved up to 37.27% performance improvement on benchmarks
Demonstrated compatibility with multiple RL algorithms
Validated effectiveness through extensive experiments on Atari and MuJoCo
Abstract
Policy exploration is critical in reinforcement learning (RL), where existing approaches include greedy, Gaussian process, etc. However, these approaches utilize preset stochastic processes and are indiscriminately applied in all kinds of RL tasks without considering task-specific features that influence policy exploration. Moreover, during RL training, the evolution of such stochastic processes is rigid, which typically only incorporates a decay in the variance, failing to adjust flexibly according to the agent's real-time learning status. Inspired by the analyzing and reasoning capability of large language models (LLMs), we design LLM-Explorer to adaptively generate task-specific exploration strategies with LLMs, enhancing the policy exploration in RL. In our design, we sample the learning trajectory of the agent during the RL training in a given task and prompt the LLM to analyze the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Software Engineering Research · Machine Learning and Data Classification
MethodsQ-Learning · Weight Decay · Adam · Dense Connections · Deep Q-Network · Experience Replay · Target Policy Smoothing · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization
