TL;DR
This paper introduces MoRAM, a continual learning method that incrementally adds rank-1 associative memory experts, improving knowledge retention and generalization in large models by reducing interference and routing ambiguity.
Contribution
MoRAM models weight matrices as associative memories, enabling incremental, fine-grained learning with content-addressable retrieval, outperforming existing mixture-of-experts approaches.
Findings
MoRAM outperforms state-of-the-art methods on CLIP and LLMs.
It achieves a better plasticity-stability trade-off.
It reduces forgetting and improves generalization.
Abstract
Continual learning (CL) with large pre-trained models aims to incrementally acquire knowledge without catastrophic forgetting. Existing LoRA-based Mixture-of-Experts (MoE) methods expand capacity by adding isolated new experts while freezing old ones, but still suffer from redundancy, interference, routing ambiguity, and consequent forgetting. We investigate the issues stemming from coarse-grained expert granularity. Coarse-grained experts (e.g., high-rank LoRA) encode low-specialty information, leading to expert duplication/interference and routing degradation/confusion as experts accumulate. In this work, we propose MoRAM (Mixture of Rank-1 Associative Memory). Grounded in the view that weight matrices act as linear associative memories, MoRAM achieves CL as gradual incrementing of reusable atomic rank-1 experts as memory. Each rank-1 adapter acts as a fine-grained MoE expert or an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis
MethodsPruning · Contrastive Language-Image Pre-training
