Rethinking Continual Learning with Progressive Neural Collapse

Zheng Wang; Wanhao Yu; Li Yang; Sen Lin

arXiv:2505.24254·cs.LG·March 10, 2026

Rethinking Continual Learning with Progressive Neural Collapse

Zheng Wang, Wanhao Yu, Li Yang, Sen Lin

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Progressive Neural Collapse (ProNC), a novel continual learning framework that dynamically expands class prototypes to improve knowledge retention and separation without relying on a fixed global ETF, leading to superior performance.

Contribution

ProNC removes the need for a fixed global ETF in continual learning by progressively expanding class prototypes, enhancing flexibility and effectiveness.

Findings

01

ProNC significantly outperforms existing methods in continual learning tasks.

02

ProNC maintains high class separability with minimal shifts from previous prototypes.

03

The framework is simple, efficient, and adaptable to various CL algorithms.

Abstract

Continual Learning (CL) seeks to build an agent that can continuously learn a sequence of tasks, where a key challenge, namely Catastrophic Forgetting, persists due to the potential knowledge interference among different tasks. On the other hand, deep neural networks (DNNs) are shown to converge to a terminal state termed Neural Collapse during training, where all class prototypes geometrically form a static simplex equiangular tight frame (ETF). These maximally and equally separated class prototypes make the ETF an ideal target for model learning in CL to mitigate knowledge interference. Thus inspired, several studies have emerged very recently to leverage a fixed global ETF in CL, which however suffers from key drawbacks, such as impracticability and limited performance.To address these challenges and fully unlock the potential of ETF in CL, we propose Progressive Neural Collapse…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

1. The paper provides a clear exposition of the background NC theory and explains its own contributions in a well-structured manner. 2. The proposed method is well-motivated by the identified limitations of prior NC-based CL works, and the use of Theorem 1 introduces a moderately novel and theoretically grounded component. 3. Comprehensive experiments demonstrate consistent and noticeable gains over a range of baselines, supporting the empirical validity of the approach.

Weaknesses

1. The technical novelty remains limited compared with the preliminary works (Yang et al., 2023a,b). The paper reads largely as a continuation of this prior line of research, where NC-based CL formulations have already been thoroughly explored. 2. The second main contribution—the ProNC-based CL framework—largely mirrors the loss formulation of NCT (Yang et al., 2023b). While Section 3.1 introduces a genuinely new idea, Section 3.2 appears nearly identical to the corresponding part in NCT. 3. Som

Reviewer 02Rating 6Confidence 4

Strengths

1. Grounded in Neural Collapse geometry, offering an interpretable view of feature alignment in continual learning. Achieves strong results without complex contrastive or generative modules. 2. Works as a plug-in regularizer across different CL frameworks (e.g., ER, iCaRL, DER++).

Weaknesses

1. The method assumes clear task segmentation (task-aware setting); its applicability to task-free or online CL remains untested. 2. As the ETF expands over many tasks, orthogonality may gradually degrade; this possible effect is not analyzed experimentally. 3. Gram–Schmidt expansion could become unstable when the number of classes approaches the embedding dimension; only small-scale datasets and ResNet-18 (d ≤ 512) were tested.

Reviewer 03Rating 2Confidence 5

Strengths

The idea of progressively adapting the ETF target during continual learning without knowing the number of total classes in advance is novel and addresses the shortcomings of fixed ETF methods for CL. The paper is built on a convincing motivation. The reasoning is coherent and carefully developed, making the overall argument both logical and easy to follow.

Weaknesses

### **Major Weaknesses** 1. **Questionable baseline performance values**: in Table 1, several baseline results (Co$^2$L, CILA, MNC$^3$L, STAR) are notably lower than those reported in their original papers (where they surpass the results from the proposed ProNC). This discrepancy indicates possible reproduction or configuration issues, undermining the fairness and credibility of the comparison and invalidating the paper’s main “state-of-the-art” claim. 2. **Missing baselines**: some important

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Cognitive Science and Education Research