Flattening Sharpness for Dynamic Gradient Projection Memory Benefits   Continual Learning

Danruo Deng; Guangyong Chen; Jianye Hao; Qiong Wang; Pheng-Ann Heng

arXiv:2110.04593·cs.LG·October 12, 2021·24 cites

Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning

Danruo Deng, Guangyong Chen, Jianye Hao, Qiong Wang, Pheng-Ann Heng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a novel continual learning method called FS-DGPM that flattens the loss landscape and adaptively manages task importance to improve learning stability and reduce forgetting.

Contribution

It introduces Flattening Sharpness (FS) and a soft weight mechanism for dynamic basis importance, enhancing continual learning performance.

Findings

01

Outperforms baseline methods in continual learning tasks.

02

Effectively reduces catastrophic forgetting.

03

Improves the generalization of learned skills.

Abstract

The backpropagation networks are notably susceptible to catastrophic forgetting, where networks tend to forget previously learned skills upon learning new ones. To address such the 'sensitivity-stability' dilemma, most previous efforts have been contributed to minimizing the empirical risk with different parameter regularization terms and episodic memory, but rarely exploring the usages of the weight loss landscape. In this paper, we investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario, based on which, we propose a novel method, Flattening Sharpness for Dynamic Gradient Projection Memory (FS-DGPM). In particular, we introduce a soft weight to represent the importance of each basis representing past tasks in GPM, which can be adaptively learned during the learning process, so that less important bases can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danruod/fs-dgpm
pytorchOfficial

Videos

Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications