Mitigating Forgetting in Continual Learning with Selective Gradient Projection

Anika Singh; Aayush Dhaulakhandi; Varun Chopade; Likhith Malipati; David Martinez; Kevin Zhu

arXiv:2603.26671·cs.LG·March 31, 2026

Mitigating Forgetting in Continual Learning with Selective Gradient Projection

Anika Singh, Aayush Dhaulakhandi, Varun Chopade, Likhith Malipati, David Martinez, Kevin Zhu

PDF

TL;DR

This paper introduces SFAO, a dynamic gradient projection method that mitigates catastrophic forgetting in continual learning by balancing plasticity and stability, with lower memory costs and improved performance.

Contribution

The paper presents SFAO, a novel selective gradient projection technique that controls forgetting and reduces memory usage in continual learning.

Findings

01

SFAO achieves competitive accuracy on benchmarks.

02

SFAO reduces memory cost by 90%.

03

SFAO improves forgetting performance on MNIST.

Abstract

As neural networks are increasingly deployed in dynamic environments, they face the challenge of catastrophic forgetting, the tendency to overwrite previously learned knowledge when adapting to new tasks, resulting in severe performance degradation on earlier tasks. We propose Selective Forgetting-Aware Optimization (SFAO), a dynamic method that regulates gradient directions via cosine similarity and per-layer gating, enabling controlled forgetting while balancing plasticity and stability. SFAO selectively projects, accepts, or discards updates using a tunable mechanism with efficient Monte Carlo approximation. Experiments on standard continual learning benchmarks show that SFAO achieves competitive accuracy with markedly lower memory cost, a 90 $%$ reduction, and improved forgetting on MNIST datasets, making it suitable for resource-constrained scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.