MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bart{\l}omiej Twardowski, Tomasz Trzci\'nski,, Sebastian Cygert

TL;DR
MagMax introduces a novel model merging technique for continual learning, allowing large pre-trained models to learn new tasks sequentially without forgetting, outperforming traditional methods in various scenarios.
Contribution
The paper presents MagMax, a new model-merging strategy that effectively enables continual learning in large pre-trained models, with extensive analysis of simple merging techniques.
Findings
Simple merging methods like weight averaging perform well.
MagMax outperforms traditional continual learning approaches.
Effective in class- and domain-incremental learning scenarios.
Abstract
This paper introduces a continual learning approach named MagMax, which utilizes model merging to enable large pre-trained models to continuously learn from new data without forgetting previously acquired knowledge. Distinct from traditional continual learning methods that aim to reduce forgetting during task training, MagMax combines sequential fine-tuning with a maximum magnitude weight selection for effective knowledge integration across tasks. Our initial contribution is an extensive examination of model merging techniques, revealing that simple approaches like weight averaging and random weight selection surprisingly hold up well in various continual learning contexts. More importantly, we present MagMax, a novel model-merging strategy that enables continual learning of large pre-trained models for successive tasks. Our thorough evaluation demonstrates the superiority of MagMax in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Machine Learning in Healthcare · Topic Modeling
