From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
Siyu Xia, Zekun Xu, Jiajun Chai, Wentian Fan, Yan Song, Xiaohan Wang, Guojun Yin, Wei Lin, Haifeng Zhang, Jun Wang

TL;DR
This paper introduces a trainable graph memory framework for LLM agents that enhances their reasoning by structuring experiences into decision paths and optimizing strategic meta-cognition through reinforcement learning.
Contribution
It presents a novel, trainable, multi-layered graph memory that improves LLM agent reasoning and adaptability by integrating structured experience and meta-cognitive strategies.
Findings
Improves strategic reasoning performance of LLM agents.
Enhances generalization and robustness during RL training.
Provides interpretable decision paths and meta-cognitive strategies.
Abstract
Large Language Models (LLMs) based agents have demonstrated remarkable potential in autonomous task-solving across complex, open-ended environments. A promising approach for improving the reasoning capabilities of LLM agents is to better utilize prior experiences in guiding current decisions. However, LLMs acquire experience either through implicit memory via training, which suffers from catastrophic forgetting and limited interpretability, or explicit memory via prompting, which lacks adaptability. In this paper, we introduce a novel agent-centric, trainable, multi-layered graph memory framework and evaluate how context memory enhances the ability of LLMs to utilize parametric information. The graph abstracts raw agent trajectories into structured decision paths in a state machine and further distills them into high-level, human-interpretable strategic meta-cognition. In order to make…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
1. The paper is clearly motivated, and the proposed solution of constructing a metacognition graph is very intuitive. 2. The method is explained in a clear way with enough details. 3. The experimental evaluation is thorough, with consistent improvement on a broad range of tasks.
See questions below.
- the factorisation of experience into strategy: The Q → FSM path → meta-cognition separation is a crisp abstraction that makes strategy-level retrieval interpretable - it goes beyond prior static memories (e.g., EXPEL/A-MEM) in conjunction with RL for LLM to learn which strategies matter
- FSM design and mapping are pushed to the Appendix, same for meta-cognition induction examples. The main text lacks one complete HotpotQA example showing: raw trace → FSM path → meta-cognition → prompt. Without it, claims like “preserves only semantically meaningful decision points” (lines 243–244) are unverifiable. - sec. 5.2 says “Detailed baseline configurations are provided in Appendix B.2,” but the baseline definitions (e.g., “Direct Trajectory,” “A-MEM,” “EXPEL,” “ITR”) and how they diffe
1. The framework is clearly described, making it easy to understand both its overall design and the role of each component. The key methods are thoroughly explained and supported with equations. 2. The experiments are well designed and make use of state-of-the-art models. The selected tasks are thoughtfully chosen, providing broad and representative coverage. The ablation study effectively demonstrates the contribution of each component.
1. The paper lacks an analysis of time consumption. Although the proposed method performs well as reported, there is no discussion of the additional computational overhead it may introduce. The method could potentially involve significant computational costs, which might limit its practical adoption compared to more efficient alternatives. 2. The paper claims that employing a structured graph memory can “distill agent trajectories into high-level, human-interpretable strategic meta-cognition.”
1. The proposed approach introduces a novel way of leveraging a graph structure to represent logical connections between user queries, intermediate states, and abstracted experiences. This structured memory facilitates reasoning and strategic decision-making. 2. The framework enables both memory graph weights and the policy model to be jointly optimized in an end-to-end manner using RL. This ensures that the memory graph is dynamically updated to reflect the latest logical flow, while the polic
1. **Limited Discussion of Related Work** The paper lacks a detailed comparison with prior works, such as Expel and G-Memory. * Compared to Expel, which also explicitly abstracts experiences and insights from past trials, the differences seem to lie in both the model's parametric updates and the structured representation of experiences. * For G-Memory, both approaches utilize graph structures for memory representation, but the distinctions between these methods are not clearly articu
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
