Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning
Chongyang Zhao, Dong Gong

TL;DR
This paper introduces MambaCL, an attention-free meta-learned sequence model for continual learning, which avoids storing all past data and demonstrates strong performance and generalization in non-stationary environments.
Contribution
It proposes MambaCL, an attention-free meta-learning approach for continual learning, with selectivity regularization, addressing efficiency and scalability issues of previous models.
Findings
MambaCL outperforms traditional models in various MCL scenarios.
Attention-free models like Mamba show strong generalization in continual learning.
Regularization improves the training efficiency and performance of MambaCL.
Abstract
Continual learning (CL) aims to efficiently learn from a non-stationary data stream, without storing or recomputing all seen samples. CL enables prediction on new tasks by incorporating sequential training samples. Building on this connection between CL and sequential modeling, meta-continual learning (MCL) aims to meta-learn an efficient continual learner as a sequence prediction model, with advanced sequence models like Transformers being natural choices. However, despite decent performance, Transformers rely on a linearly growing cache to store all past representations, conflicting with CL's objective of not storing all seen samples and limiting efficiency. In this paper, we focus on meta-learning sequence-prediction-based continual learners without retaining all past representations. While attention-free models with fixed-size hidden states (e.g., Linear Transformers) align with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Technology Integration
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
