Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang

TL;DR
Memento introduces a memory-based online reinforcement learning approach for LLM agents that enables continual adaptation without fine-tuning the models, achieving state-of-the-art results in research and out-of-distribution tasks.
Contribution
It presents a novel memory-augmented MDP framework allowing LLM agents to adapt continually without gradient fine-tuning, outperforming existing training-based methods.
Findings
Achieves top-1 on GAIA validation with 87.88% Pass@3
Reaches 79.40% on the test set, outperforming state-of-the-art
Outperforms training-based methods on out-of-distribution tasks
Abstract
In this paper, we introduce a novel learning paradigm for Adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
