Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

Yunqin Zhu; Jun Jin

arXiv:2603.04580·cs.LG·March 6, 2026

Why Do Neural Networks Forget: A Study of Collapse in Continual Learning

Yunqin Zhu, Jun Jin

PDF

Open Access

TL;DR

This paper investigates how structural collapse in neural networks correlates with catastrophic forgetting in continual learning, using effective rank metrics across various architectures and strategies.

Contribution

It introduces a detailed analysis linking model collapse to forgetting, emphasizing the importance of internal structure preservation for continual learning.

Findings

01

Forgetting correlates strongly with collapse in effective rank.

02

Different strategies help preserve model capacity and performance.

03

Structural collapse impacts plasticity and learning ability.

Abstract

Catastrophic forgetting is a major problem in continual learning, and lots of approaches arise to reduce it. However, most of them are evaluated through task accuracy, which ignores the internal model structure. Recent research suggests that structural collapse leads to loss of plasticity, as evidenced by changes in effective rank (eRank). This indicates a link to forgetting, since the networks lose the ability to expand their feature space to learn new tasks, which forces the network to overwrite existing representations. Therefore, in this study, we investigate the correlation between forgetting and collapse through the measurement of both weight and activation eRank. To be more specific, we evaluated four architectures, including MLP, ConvGRU, ResNet-18, and Bi-ConvGRU, in the split MNIST and Split CIFAR-100 benchmarks. Those models are trained through the SGD,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Memory Processes and Influences · Visual Attention and Saliency Detection