Soup to go: mitigating forgetting during continual learning with model averaging
Anat Kleiman, Gintare Karolina Dziugaite, Jonathan Frankle, Sham, Kakade, Mansheej Paul

TL;DR
The paper introduces Sequential Fine-tuning with Averaging (SFA), a novel method for continual learning that merges models during training to mitigate forgetting without storing past data, outperforming existing merging techniques.
Contribution
Proposes SFA, a model merging approach during training that reduces catastrophic forgetting in continual learning without extra data storage or multiple model copies.
Findings
SFA achieves comparable or better performance than state-of-the-art methods.
SFA outperforms traditional merging and penalty methods across image and language tasks.
The method effectively mitigates forgetting without additional computational costs.
Abstract
In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earlier tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
