MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers

Ajay Jaiswal; Lauren Hannah; Han-Byul Kim; Duc Hoang; Arnav Kundu; Mehrdad Farajtabar; Minsik Cho

arXiv:2602.00398·cs.LG·February 3, 2026

MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers

Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Arnav Kundu, Mehrdad Farajtabar, Minsik Cho

PDF

Open Access

TL;DR

MemoryLLM introduces a novel approach to interpret and enhance transformer feed-forward networks by decoupling them from self-attention, enabling token-wise neural retrieval memory and improving inference efficiency.

Contribution

The paper presents MemoryLLM, a method to make FFNs context-free and interpretable, and introduces Flex-MemoryLLM to bridge performance gaps in transformer models.

Findings

01

MemoryLLM enables token-wise lookups for FFNs, improving interpretability.

02

Decoupled FFNs can be pre-computed for efficient inference.

03

Flex-MemoryLLM reduces performance gap in transformer architectures.

Abstract

Understanding how transformer components operate in LLMs is important, as it is at the core of recent technological advances in artificial intelligence. In this work, we revisit the challenges associated with interpretability of feed-forward modules (FFNs) and propose MemoryLLM, which aims to decouple FFNs from self-attention and enables us to study the decoupled FFNs as context-free token-wise neural retrieval memory. In detail, we investigate how input tokens access memory locations within FFN parameters and the importance of FFN memory across different downstream tasks. MemoryLLM achieves context-free FFNs by training them in isolation from self-attention directly using the token embeddings. This approach allows FFNs to be pre-computed as token-wise lookups (ToLs), enabling on-demand transfer between VRAM and storage, additionally enhancing inference efficiency. We also introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Explainable Artificial Intelligence (XAI) · Advanced Memory and Neural Computing