Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning

Narjes Nourzad; Carlee Joe-Wong

arXiv:2602.17931·cs.LG·February 23, 2026

Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning

Narjes Nourzad, Carlee Joe-Wong

PDF

Open Access

TL;DR

This paper introduces a memory-based advantage shaping method for LLM-guided reinforcement learning that improves sample efficiency by leveraging a memory graph of subgoals and trajectories, reducing reliance on frequent LLM calls.

Contribution

The authors propose a novel memory graph approach that encodes subgoals and trajectories to guide RL, minimizing the need for continuous LLM interaction and enhancing learning efficiency.

Findings

01

Improved sample efficiency in benchmark environments.

02

Faster early learning compared to baseline RL methods.

03

Final performance comparable to methods with frequent LLM use.

Abstract

In environments with sparse or delayed rewards, reinforcement learning (RL) incurs high sample complexity due to the large number of interactions needed for learning. This limitation has motivated the use of large language models (LLMs) for subgoal discovery and trajectory guidance. While LLMs can support exploration, frequent reliance on LLM calls raises concerns about scalability and reliability. We address these challenges by constructing a memory graph that encodes subgoals and trajectories from both LLM guidance and the agent's own successful rollouts. From this graph, we derive a utility function that evaluates how closely the agent's trajectories align with prior successful strategies. This utility shapes the advantage function, providing the critic with additional guidance without altering the reward. Our method relies primarily on offline input and only occasional online…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling