Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
Sidi Lu, Zhenwen Liang, Dongyang Ma, Yan Wang, Haitao Mi, Dong Yu

TL;DR
Locas introduces a principled, locally-supported parametric memory for transformers that enables efficient continual learning and memory offloading, improving model generalization and reducing catastrophic forgetting with minimal additional parameters.
Contribution
The paper proposes Locas, a novel parametric memory mechanism for transformers, with a principled initialization method that enhances continual learning and memory integration.
Findings
Locas achieves effective memory storage with only 0.02% additional parameters.
Locas improves generalization and reduces catastrophic forgetting in language modeling.
Locas demonstrates strong performance on long-context dialogue and book modeling tasks.
Abstract
In this paper, we aim to bridge test-time-training with a new type of parametric memory that can be flexibly offloaded from or merged into model parameters. We present Locas, a Locally-Supported parametric memory that shares the design of FFN blocks in modern transformers, allowing it to be flexibly permanentized into the model parameters while supporting efficient continual learning. We discuss two major variants of Locas: one with a conventional two-layer MLP design that has a clearer theoretical guarantee; the other one shares the same GLU-FFN structure with SOTA LLMs, and can be easily attached to existing models for both parameter-efficient and computation-efficient continual learning. Crucially, we show that proper initialization of such low-rank sideway-FFN-style memories -- performed in a principled way by reusing model parameters, activations and/or gradients -- is essential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
