Improving Sparse Memory Finetuning

Satyam Goyal; Anirudh Kanchi; Garv Shah; Prakhar Gupta

arXiv:2604.05248·cs.LG·April 8, 2026

Improving Sparse Memory Finetuning

Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta

PDF

TL;DR

This paper introduces Sparse Memory Finetuning (SMF), a method that localizes model updates to memory layers, enabling continual learning in large language models with minimal forgetting and practical hardware requirements.

Contribution

It presents an open-source pipeline for retrofitting pretrained models with sparse memory modules and a KL divergence-based slot-selection mechanism for effective continual learning.

Findings

01

Retrofitted models acquire new factual knowledge effectively.

02

Models show minimal forgetting of existing capabilities.

03

The KL-based slot selection improves update relevance.

Abstract

Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: catastrophic forgetting. They modify shared dense representations, causing interference across tasks. Sparse Memory Finetuning (SMF) offers a promising alternative by localizing updates to a small subset of parameters in explicit memory layers. In this work, we present an open-source pipeline to retrofit existing pretrained models (Qwen-2.5-0.5B) with sparse memory modules, enabling effective continual learning on consumer hardware. We extend prior work by introducing a theoretically grounded slot-selection mechanism based on Kullback-Leibler (KL) divergence, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.