What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
Xutao Mao, Jinman Zhao, Gerald Penn, Cong Wang

TL;DR
This paper investigates the internal circuit mechanisms of agent memory in large language models, revealing how control and content signals emerge and how silent failures can be diagnosed through circuit signatures.
Contribution
It uncovers the circuit-level dynamics of memory in LLMs and introduces an unsupervised diagnostic method for localizing memory failures.
Findings
Control signals are detectable before content signals in small models.
Shared hubs are recruited from existing model structures, not created anew.
An unsupervised diagnostic achieves 76.2% accuracy in localizing failures.
Abstract
Agent memory failures are silent: an LLM-based agent can produce a fluent response even when it fails to extract, retain, or retrieve the information needed across sessions. The write-manage-read loop describes the external pipeline of these systems but leaves open which internal computations implement each stage. Tracing feature circuits across the Qwen-3 family (0.6B--14B) and two memory frameworks (mem0 and A-MEM), we report two mechanistic findings and one deliverable. First, control is detectable before content: routing circuitry is causally active at 0.6B, while content circuitry produces no detectable signal until 4B, exposing a deployment regime where small models route memory decisions before they can reliably extract or ground the underlying facts. Second, the shared hub is recruited, not created: Write and Read converge on a late-layer hub that already exists in the base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
