Improving Sparse Memory Finetuning
Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta

TL;DR
This paper introduces Sparse Memory Finetuning (SMF), a method that localizes model updates to memory layers, enabling continual learning in large language models with minimal forgetting and practical hardware requirements.
Contribution
It presents an open-source pipeline for retrofitting pretrained models with sparse memory modules and a KL divergence-based slot-selection mechanism for effective continual learning.
Findings
Retrofitted models acquire new factual knowledge effectively.
Models show minimal forgetting of existing capabilities.
The KL-based slot selection improves update relevance.
Abstract
Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: catastrophic forgetting. They modify shared dense representations, causing interference across tasks. Sparse Memory Finetuning (SMF) offers a promising alternative by localizing updates to a small subset of parameters in explicit memory layers. In this work, we present an open-source pipeline to retrofit existing pretrained models (Qwen-2.5-0.5B) with sparse memory modules, enabling effective continual learning on consumer hardware. We extend prior work by introducing a theoretically grounded slot-selection mechanism based on Kullback-Leibler (KL) divergence, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
