Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning

Chongyang Zhao; Dong Gong

arXiv:2412.00776·cs.LG·May 27, 2025

Learning Mamba as a Continual Learner: Meta-learning Selective State Space Models for Efficient Continual Learning

Chongyang Zhao, Dong Gong

PDF

Open Access

TL;DR

This paper introduces MambaCL, an attention-free meta-learned sequence model for continual learning, which avoids storing all past data and demonstrates strong performance and generalization in non-stationary environments.

Contribution

It proposes MambaCL, an attention-free meta-learning approach for continual learning, with selectivity regularization, addressing efficiency and scalability issues of previous models.

Findings

01

MambaCL outperforms traditional models in various MCL scenarios.

02

Attention-free models like Mamba show strong generalization in continual learning.

03

Regularization improves the training efficiency and performance of MambaCL.

Abstract

Continual learning (CL) aims to efficiently learn from a non-stationary data stream, without storing or recomputing all seen samples. CL enables prediction on new tasks by incorporating sequential training samples. Building on this connection between CL and sequential modeling, meta-continual learning (MCL) aims to meta-learn an efficient continual learner as a sequence prediction model, with advanced sequence models like Transformers being natural choices. However, despite decent performance, Transformers rely on a linearly growing cache to store all past representations, conflicting with CL's objective of not storing all seen samples and limiting efficiency. In this paper, we focus on meta-learning sequence-prediction-based continual learners without retaining all past representations. While attention-free models with fixed-size hidden states (e.g., Linear Transformers) align with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducation and Technology Integration

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces