Soup to go: mitigating forgetting during continual learning with model   averaging

Anat Kleiman; Gintare Karolina Dziugaite; Jonathan Frankle; Sham; Kakade; Mansheej Paul

arXiv:2501.05559·cs.LG·January 13, 2025

Soup to go: mitigating forgetting during continual learning with model averaging

Anat Kleiman, Gintare Karolina Dziugaite, Jonathan Frankle, Sham, Kakade, Mansheej Paul

PDF

Open Access

TL;DR

The paper introduces Sequential Fine-tuning with Averaging (SFA), a novel method for continual learning that merges models during training to mitigate forgetting without storing past data, outperforming existing merging techniques.

Contribution

Proposes SFA, a model merging approach during training that reduces catastrophic forgetting in continual learning without extra data storage or multiple model copies.

Findings

01

SFA achieves comparable or better performance than state-of-the-art methods.

02

SFA outperforms traditional merging and penalty methods across image and language tasks.

03

The method effectively mitigates forgetting without additional computational costs.

Abstract

In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earlier tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning