TL;DR
MemQ introduces a novel memory updating mechanism using TD($5$) eligibility traces over provenance DAGs, significantly improving generalization and learning in memory-augmented LLM agents across diverse benchmarks.
Contribution
It formalizes a new framework for memory credit assignment in LLM agents using DAG-structured provenance and eligibility traces, enhancing memory utilization and task performance.
Findings
Achieves highest success rates on all six benchmarks tested.
Significant improvements on multi-step tasks with deep provenance chains.
Provides guidance for parameter selection based on DAG structure.
Abstract
Episodic memory allows LLM agents to accumulate and retrieve experience, but current methods treat each memory independently, i.e., evaluating retrieval quality in isolation without accounting for the dependency chains through which memories enable the creation of future memories. We introduce MemQ, which applies TD() eligibility traces to memory Q-values, propagating credit backward through a provenance DAG that records which memories were retrieved when each new memory was created. Credit weight decays as with DAG depth , replacing temporal distance with structural proximity. We formalize the setting as an Exogenous-Context MDP, whose factored transition decouples the exogenous task stream from the endogenous memory store. Across six benchmarks, spanning OS interaction, function calling, code generation, multimodal reasoning, embodied reasoning, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
