LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation
Heng Tan, Hua Yan, Yu Yang

TL;DR
This paper introduces a novel RL training method guided by large language models that identifies critical states and suggests actions, improving policy learning without extra training or human input.
Contribution
We propose an LLM-guided policy modulation framework that enhances RL training by leveraging LLMs to identify critical states and provide action suggestions, avoiding additional model training or human feedback.
Findings
Outperforms state-of-the-art baselines on standard RL benchmarks.
Effectively identifies critical states using LLM prompts.
Improves policy convergence and reward maximization.
Abstract
While reinforcement learning (RL) has achieved notable success in various domains, training effective policies for complex tasks remains challenging. Agents often converge to local optima and fail to maximize long-term rewards. Existing approaches to mitigate training bottlenecks typically fall into two categories: (i) Automated policy refinement, which identifies critical states from past trajectories to guide policy updates, but suffers from costly and uncertain model training; and (ii) Human-in-the-loop refinement, where human feedback is used to correct agent behavior, but this does not scale well to environments with large or continuous action spaces. In this work, we design a large language model-guided policy modulation framework that leverages LLMs to improve RL training without additional model training or human intervention. We first prompt an LLM to identify critical states…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
