Trained Persistent Memory for Frozen Decoder-Only LLMs
Hong Jeong

TL;DR
This paper explores methods to enable persistent memory in frozen decoder-only language models like GPT-2, demonstrating that architectural priors significantly influence memory retention at low capacity, with all methods converging at higher capacity.
Contribution
It adapts six memory methods to frozen GPT-2, revealing the importance of architecture in persistent memory and establishing it as a general paradigm across transformer models.
Findings
Cross-attention, Hebbian, and slot write methods achieve 7-18% memory scores at low capacity.
All six methods converge at higher capacity, indicating architectural influence.
Memory methods improve knowledge retention in frozen decoder-only models.
Abstract
Decoder-only language models are stateless: hidden representations are discarded after every forward pass and nothing persists across sessions. Jeong (2026a) showed that trained memory adapters give a frozen encoder-decoder backbone persistent latent-space memory, building on the lateral-memory framework of Jeong (2026b,c). Here we ask whether the same principle transfers to the decoder-only setting, where no cross-attention pathway exists and memory must enter through self-attention alone. We adapt six methods -- prefix, parallel cross-attention, KV extension, Hebbian memory, context-gated branch, and slot-based sparse write -- to a frozen GPT-2, training only a small adapter . The write rule is shared; only the read injection changes from decoder cross-attention to self-attention KV prefix or parallel branch. On LoCoMo we find a striking inductive-bias dichotomy: at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Natural Language Processing Techniques
