LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation

Heng Tan; Hua Yan; Yu Yang

arXiv:2505.20671·cs.AI·May 28, 2025

LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation

Heng Tan, Hua Yan, Yu Yang

PDF

Open Access

TL;DR

This paper introduces a novel RL training method guided by large language models that identifies critical states and suggests actions, improving policy learning without extra training or human input.

Contribution

We propose an LLM-guided policy modulation framework that enhances RL training by leveraging LLMs to identify critical states and provide action suggestions, avoiding additional model training or human feedback.

Findings

01

Outperforms state-of-the-art baselines on standard RL benchmarks.

02

Effectively identifies critical states using LLM prompts.

03

Improves policy convergence and reward maximization.

Abstract

While reinforcement learning (RL) has achieved notable success in various domains, training effective policies for complex tasks remains challenging. Agents often converge to local optima and fail to maximize long-term rewards. Existing approaches to mitigate training bottlenecks typically fall into two categories: (i) Automated policy refinement, which identifies critical states from past trajectories to guide policy updates, but suffers from costly and uncertain model training; and (ii) Human-in-the-loop refinement, where human feedback is used to correct agent behavior, but this does not scale well to environments with large or continuous action spaces. In this work, we design a large language model-guided policy modulation framework that leverages LLMs to improve RL training without additional model training or human intervention. We first prompt an LLM to identify critical states…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics