LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models

Qianyue Hao; Yiwen Song; Qingmin Liao; Jian Yuan; Yong Li

arXiv:2505.15293·cs.LG·October 24, 2025

LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models

Qianyue Hao, Yiwen Song, Qingmin Liao, Jian Yuan, Yong Li

PDF

Open Access 1 Video

TL;DR

This paper introduces LLM-Explorer, a plug-in module that leverages large language models to generate adaptive, task-specific exploration strategies in reinforcement learning, significantly improving performance on benchmark tasks.

Contribution

The paper presents a novel method using LLMs to dynamically generate and update exploration strategies in RL, tailored to each task and learning stage, which is a significant advancement over traditional fixed stochastic processes.

Findings

01

Achieved up to 37.27% performance improvement on benchmarks

02

Demonstrated compatibility with multiple RL algorithms

03

Validated effectiveness through extensive experiments on Atari and MuJoCo

Abstract

Policy exploration is critical in reinforcement learning (RL), where existing approaches include greedy, Gaussian process, etc. However, these approaches utilize preset stochastic processes and are indiscriminately applied in all kinds of RL tasks without considering task-specific features that influence policy exploration. Moreover, during RL training, the evolution of such stochastic processes is rigid, which typically only incorporates a decay in the variance, failing to adjust flexibly according to the agent's real-time learning status. Inspired by the analyzing and reasoning capability of large language models (LLMs), we design LLM-Explorer to adaptively generate task-specific exploration strategies with LLMs, enhancing the policy exploration in RL. In our design, we sample the learning trajectory of the agent during the RL training in a given task and prompt the LLM to analyze the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Software Engineering Research · Machine Learning and Data Classification

MethodsQ-Learning · Weight Decay · Adam · Dense Connections · Deep Q-Network · Experience Replay · Target Policy Smoothing · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization