MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance

Narjes Nourzad; Carlee Joe-Wong

arXiv:2602.17930·cs.LG·February 23, 2026

MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance

Narjes Nourzad, Carlee Joe-Wong

PDF

Open Access

TL;DR

MIRA introduces a memory-augmented reinforcement learning framework that leverages structured memory graphs to reduce reliance on continuous LLM supervision, improving early learning in sparse reward environments.

Contribution

MIRA integrates a structured memory graph with RL to amortize LLM queries, enhancing early learning without continuous supervision and providing theoretical guarantees.

Findings

01

MIRA outperforms baseline RL methods in sparse reward tasks.

02

MIRA achieves comparable performance to LLM-supervised approaches with fewer LLM queries.

03

The utility-based shaping accelerates early-stage learning in sparse environments.

Abstract

Reinforcement learning (RL) agents often suffer from high sample complexity in sparse or delayed reward settings due to limited prior structure. Large language models (LLMs) can provide subgoal decompositions, plausible trajectories, and abstract priors that facilitate early learning. However, heavy reliance on LLM supervision introduces scalability constraints and dependence on potentially unreliable signals. We propose MIRA (Memory-Integrated Reinforcement Learning Agent), which incorporates a structured, evolving memory graph to guide early training. The graph stores decision-relevant information, including trajectory segments and subgoal structures, and is constructed from both the agent's high-return experiences and LLM outputs. This design amortizes LLM queries into a persistent memory rather than requiring continuous real-time supervision. From this memory graph, we derive a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification