Trained Persistent Memory for Frozen Decoder-Only LLMs

Hong Jeong

arXiv:2603.22329·cs.LG·March 25, 2026

Trained Persistent Memory for Frozen Decoder-Only LLMs

Hong Jeong

PDF

Open Access

TL;DR

This paper explores methods to enable persistent memory in frozen decoder-only language models like GPT-2, demonstrating that architectural priors significantly influence memory retention at low capacity, with all methods converging at higher capacity.

Contribution

It adapts six memory methods to frozen GPT-2, revealing the importance of architecture in persistent memory and establishing it as a general paradigm across transformer models.

Findings

01

Cross-attention, Hebbian, and slot write methods achieve 7-18% memory scores at low capacity.

02

All six methods converge at higher capacity, indicating architectural influence.

03

Memory methods improve knowledge retention in frozen decoder-only models.

Abstract

Decoder-only language models are stateless: hidden representations are discarded after every forward pass and nothing persists across sessions. Jeong (2026a) showed that trained memory adapters give a frozen encoder-decoder backbone persistent latent-space memory, building on the lateral-memory framework of Jeong (2026b,c). Here we ask whether the same principle transfers to the decoder-only setting, where no cross-attention pathway exists and memory must enter through self-attention alone. We adapt six methods -- prefix, parallel cross-attention, KV extension, Hebbian memory, context-gated branch, and slot-based sparse write -- to a frozen GPT-2, training only a small adapter $θ_{m e m}$ . The write rule is shared; only the read injection changes from decoder cross-attention to self-attention KV prefix or parallel branch. On LoCoMo we find a striking inductive-bias dichotomy: at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Natural Language Processing Techniques